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THE SYSTEMATIC ERROR OF HERRING-BINET IN 
RATING GIFTED CHILDREN 


HERBERT A. CARROLL AND LETA 8. HOLLINGWORTH 


Teachers College, Columbia University 


THe HeERRING-BINET AS AN ALTERNATE TO THE STANFORD-BINET 


There is a general belief that the Herring-Binet is an excellent 
alternate for the Stanford-Binet. The early statement by Herring 
that “‘ We are bound to interpret the Pearson r of .991 as meaning that 
there is very little to choose between the Stanford-Binet and the Her- 
ring-Binet as far as reliability and validity are concerned,’’! seems 
to have been accepted at its face value. The Herring test, using the 
Stanford as its criterion and composed of very similar material, would 
appear, without critical examination, to be wholly satisfactory as a 
substitute for the Stanford in all cases. This paper is concerned with 
its value as a measuring instrument in work with gifted children. 


MARKED DIFFERENCES IN IQ 


Eighty children, between the ages of seven and twelve, selected 
at various times, from various sources, on preliminary indications of 
brightness, were found to have IQ’s on the Stanford-Binet ranging 
from 133 to 190, with a mean at 150.4. These children were retested 
with the Herring with rather startling results. A constant difference 
of —17.2 points in IQ was found for the group.? Following up these 
findings, fifty-two of the eighty-six subjects above mentioned were 





1 Herring, J. P.: ‘‘Herring Revision of the Binet-Simon Tests and Verbal and 
Abstract Elements in Intelligence Examinations.’’ World Book Co., Yonkers, 
1924, p. 13. 

2In this discussion a minus sign is used to indicate that the Herring score is 
less than the Stanford, and a plus sign that it is more than the Stanford. 
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Fig. 1.—Showing the comparative rating of a group of 80 gifted children on the 
Stanford-Binet and Herring-Binet. 
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Fie. 2.—A comparison of individual scores of 80 gifted children on Stanford-Binet and 
Herring-Binet. (Comparative scores of every fifth child indicated.) 
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TaBLeE I.—SHOWING THE MARKED DIFFERENCE BETWEEN STANFORD-BINET AND 
HERRING-BineET IQ’s For A Group or Girrep CHILDREN 

















IQ IQ IQ 
Child yr yore : Average | Average — 
S-B: | H-B: S-B: | H-B: 63... |H-Bis 
1 141 125 —16 144 133 -11 142.5 129 —13.5 
2 135 133 -— 2 167 143 — 24 151 138 —13 
3 137 121 —16 
4 141 151 +10 163 149 —14 152 150 -— 2 
5 167 151 —16 177 173 - 4 172 162 —10 
6 156 145 —i1 151 137 —14 153.5 141 —12.5 
7 138 121 —17 150 116 — 34 144 118.5 | —25.5 
8 139 108 —31 154 124 — 30 146.5 116 — 30.5 
9 145 125 — 20 153 138 —15 149 131.5 | —17.5 
10 154 140 —14 174 159 —15 164 149.5 | —14.5 
11 143 123 — 20 
12 142 134 - 8 149 139 —10 145.5 136.5 | — 9 
13 145 108 — 37 
14 167 151 —16 175 159 —16 171 155 —16 
15 146 134 —12 137 142 + 5 141.5 138 — 3.5 
16 150 118 — 32 156 121 — 35 153 119.5 | —33.5 
17 151 133 —18 147 119 — 28 149 126 — 23 
18 149 146 - 3 170 153 —-17 159.5 149.5 | —10 
19 144 113 —31 135 124 —11 139.5 118.5 | —21 
20 163 134 — 29 160 153 -— 7 161.5 143.5 | —18 
21 168 140 — 28 
22 144 134 —10 
23 157 130 —27 
24 154 125 — 29 146 117 — 29 150 121 — 29 
25 135 127 - 8 137 121 —16 136 124 —12 
26 143 143 0 143 140 —- 3 143 141.5 | — 1.5 
27 153 146 - 7 163 143 — 20 158 144.5 | —13.5 
28 156 147 - 9 
29 154 140 —14 144 150 + 6 149 145 —- 4 
30 152 135 —17 146 147 +1 149 141 - § 
31 160 121 — 39 159 139 — 20 159.5 130 —29.5 
32 172 161 —I11 180 144 — 36 176 152.5 | —23.5 
33 175 137 — 38 181 151 — 30 178 144 —34 
34 157 130 —27 
35 149 131 —18 128 131 + 3 138.5 131 — 7.5 
36 190 174 —16 188 169 —19 189 171.5 | —17.5 
37 156 130 — 26 162 128 — 34 159 129 — 30 
38 141 141 0 
39 166 151 —15 174 147 —27 170 149 —21 
40 171 169 - 2 168 150 —18 169.5 159.5 | —10 
41 156 136 — 20 170 139 —31 163 137.5 | —25.5 
42 188 159 — 29 182 162 — 20 185 160.5 | —24.5 
43 177 142 — 35 171 140 —31 174 141 — 33 
44 141 116 —25 155 126 —29 148 121 —27 
45 160 144 —16 152 155 + 3 156 149.5 | — 6.5 
46 133 119 —14 160 123 — 37 146.5 121 —25.5 
47 145 116 — 29 149 130 —19 147 123 —24 
48 134 131 —- 3 
49 138 129 - 9 141 125 —16 139.5 127 —12.5 
50 137 120 -—17 150 134 —16 143.5 127 —16.5 
51 135 115 —20 146 120 — 26 140.5 117.5 | —23 
52 149 136 —13 160 139 —21 154.5 137.5 | —17 
53 156 135 —21 164 145 —19 160 140 — 20 
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TaBLE I.—Continued 

















IQ IQ IQ 
Child yo ‘ — ; Average | A mo , 
e| Average 
8-Bi H-Bi S-B: H-B: S-Bi: | H-Bis 
54 162 147 —15 170 138 —32 166 142.5 | —23.5 
55 164 127 — 37 159 145 —14 161.5 136 — 25.5 
56 162 136 — 26 159 137 —22 160.5 136.5 | —24 
57 140 123 -—17 152 129 — 23 146 126 —20 
58 154 126 — 28 157 137 — 20 155.5 131.5 | —24 
59 147 135 —12 144 120 —24 145.5 127.5 | —18 
60 157 133 — 24 162 147 —15 159.5 140 —19.5 
61 139 125 —14 147 127 — 20 143 126 —17 
62 164 136 — 28 
63 158 128 — 30 
64 141 142 +1 153 129 —24 147 135.5 | —11.5 
65 164 134 — 30 
66 151 127 — 24 
67 145 129 —16 
68 133 114 —19 
69 152 140 —12 
70 143 119 — 24 
71 135 127 —- 8 
72 156 141 —15 
73 139 127 —12 
74 134 122 —12 
75 143 139 — 4 
76 133 120 —13 
77 140 143 + 3 
78 140 131 —- 9 
79 141 143 + 2 
80 139 129 —10 
Average | differencie...... 17.4 aon ons © Ieee, Warr 18.3 
Constant | differencie......| —17.2 vas see pe, ey —18.3 
































retested one year later with both the Stanford and the Herring. The 
second set of results checked very closely with the first. (See Table I 
and Figures 1 and 2.) 

It is interesting to note that the Herring IQ exceeds the Stanford 
in only four cases in the first test, five in the second, and in no instance 
when the two sets are combined. The former runs consistently lower 
—very much lower. 

In this connection we looked up Herring’s original data,’ and 
found that here as well there was a constant minus difference, though 
much less than in our findings. Nine of his one hundred fifty-four 
original subjects (of various ages) fell within the Stanford-Binet IQ 
range considered in this discussion. It is significant that of these 





1 Ibid., pp. 68-71. 
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nine, only one tested higher on the Herring. One other was the same 
on both, and the remaining seven were from one to thirteen IQ points 
lower on the Herring. 


CHOICE OF A CRITERION OF VALIDITY 


Since Herring-Binet yields such different results from Stanford- 
Binet when used with gifted children, as proved above, the question 
immediately follows: Which of the two is the more nearly valid as an 
instrument of prediction? 


Fortunately we have at hand data to show the comparative validity 
of these two instruments, insofar as power to predict scholastic per- 
formance is concerned. These data consist of complete scholastic 


records of a group of gifted children who were for three years under 
experimental observation.’ 


The children were selected, in 1922 by a variety of preliminary 
indications of brightness, being ultimately tested by Stanford-Binet, 
and if found to test above 135 IQ (S-B) finally chosen for the experi- 
mental group. The three conditions of admissibility to this group 
were that a child must (1) be above 135 IQ on first test by Stanford- 
Binet; (2) be seven but not nine years old; (3) have consent of parents 
to join special classes. 

Having been admitted to the experimental group, already described 
elsewhere,?:* each child was given a Herring-Binet test, during 1922 or 


1 These children were selected and instructed under the supervision of a joint 
committee consisting of Mr. Jacob Theobald and Miss Jane Monahan, of Public 
School 165, Manhattan; Miss Margaret V. Cobb, Dr. Grace A. Taylor and Prof. 
Leta S. Hollingworth, of Teachers College, Columbia University, with the coopera- 
tion of District Superintendent John E. Wade, and of the Institute of Educational 
Research, Teachers College. The data here cited in connection with our criterion 
of validity, were collected in part by means of funds granted by The Carnegie 
Corporation of New York. 

It should be especially mentioned that the achievement tests and also many of 
the mental tests, were given by Miss Cobb, during the course of experimentation 
undertaken by the committee, and that we have consulted with Miss Cobb in 
connection with the present study. 

2 Cobb, M. V., and Taylor, G. A.: “Stanford Achievement Tests with a Group 
of Gifted Children.” Twenty-third Yearbook of the National Society for the 
Study of Education. Public School Publishing Co., Blomington, IIl., 1924. 

§ Hollingworth, L. 8., and Cobb, M. V.: “‘Children Clustering at 165 IQ and 
Children Clustering at 146 IQ Compared for Three Years in Achievement.” 
‘Twenty-seventh Yearbook of the National Society for the Study of Education. 
Public School Publishing Co., Bloomington, I1l., 1928. 
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early in 1923. Also, one year after the first Stanford-Binet this test 
was repeated with each child, and this was done also in the case of 
Herring-Binet. 

Furthermore all of these children took numerous standardized 
tests of scholastic achievement, over the experimental period of three 
years.!:2, From numerous and various criteria of validity thus made 
available, the present investigators chose the EQ revealed by the 
Stanford Achievement Test, Form B, administered to the group on 
June 5, 1924, and scored by the revised age-norms. This test was 
selected to form our criterion because (1) it consists of a battery of 
tests drawing upon a wide variety of skills and knowledges which 
underlie scholastic success; (2) it was administered on a date when the 
children of the group had for two years had exceptional opportunity 
to learn in accordance with their ability, and yet when they were still 
so young that they did not exceed the limitations of this scale; (3) 
it has been standardized on actual and appropriate populations of 
school children, with age-norms as well as grade-norms; (4) it shows 
us achievement subsequent to the mental tests themselves, which 
instruments of prediction must foretell. Furthermore, inspection of 
the graphs presented in another connection? will show that this battery 
of tests is excellently representative of comparative scholastic achieve- 
ment as variously measured. 

There were available forty children who took the Stanford Achieve- 
ment Test, Form B, in full on June 5, 1924; who had had two Stanford- 
Binets a year apart, and two Herring-Binets a year apart; who had not 
yet exceeded the limits of any of these tests when given. The achieve- 
ment (EQ’s) of these forty children (all testing in the top percentile by 
Stanford-Binet in 1922) constitutes the criterion of validity by which the 
two instruments being compared are proved. 


APPLICATION OF THE CRITERION 


Table II represents in full data showing the comparative validity 
of Stanford-Binet and of Herring-Binet in use with gifted children. 
Minus errors indicate that the test of intelligence wunderpredicted 
subsequent performance; plus errors indicate overprediction. The 
children concerned are listed individually by means of the identifica- 
tion numbers used in a previous study made for a different purpose,? 





1Cobb, M. V., and Taylor, G. A.: Loe. cit. 
2 Hollingworth, L. 8. and Cobb, M. V.: Loe. cit. 
Cobb, M. V. and Taylor, G. A.: Loc. cit. — 
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TIVELY FOR VALIDITY wiTH GIFTED CHILDREN, AGAINST COMPOSITE 


ScHOLASTIC PERFORMANCE, MEASURED BY STANFORD ACHIEVE- 


MENT Test, Form B, As a CRITERION 


eS Se 


Scene 
ep 











n = 40 
June 5, 1924 Cri ; m : = 1Q 1Q 
Child terion Q H-B is average | average 
EQ H-Bi -B: 1 8-B: H-Bi, | S-Bi, s 
CA EA 

35 911 | 16-1 162 151 149 141 163 150 152 

4 10-5 17-0 163 145 137 156 151 141 154 
44 9-4 14-814 158 140 159 154 174 150 164 

3 10-6 17-2% 164 151 159 167 175 155 171 
20 9-7 16-0 167 134 153 163 160 144 162 
23 10-2 15-2 149 140 150 154 144 145 149 

9 10-5 17-34% 166 121 139 160 159 130 160 
13 9-9 18-10 193 161 147 172 180 154 176 

2 10-8 17-5 165 137 151 175 181 144 178 
30 9-1 15-6 171 151 136 166 174 144 170 

8 911 | 17-34% 174 169 166 171 172 168 172 
26 10-5 15-9 151 136 139 170 156 138 163 
10 10-10 | 16-10% 156 142 140 171 173 141 172 
22 9-9 15-11 163 144 155 160 152 150 156 

7 10-10 | 17-1 158 135 145 156 164 140 160 
11 10-8 16-6 154 147 138 162 170 143 166 
27 10-8 16-1 151 127 145 164 159 136 162 
36 10-0 15-3 153 136 155 162 159 146 161 
24 10-11 | 164% 150 133 147 157 162 140 160 
49 10-2 15-04% 148 125 131 141 144 128 143 
50 10-1 13-2 131 121 116 138 150 119 144 
12 10-8 16-4 153 125 138 145 153 132 149 
46 10-3 15-1 147 134 139 142 149 137 146 
33 10-1 14-6 144 133 119 151 147 126 149 
45 10-9 14-7 136 113 124 144 135 119 140 
40 10-1 | 14-8 145 127 121 135 137 124 136 
51 9-11 | 14-1 142 143 140 143 143 142 143 
21 10-7 15-7 147 131 131 149 138 131 144 
14 11-3 16-3 145 116 112 141 155 114 148 
29 9-10 | 14-634 148 119 123 133 141 121 137 
31 11-0 15-4 139 116 130 145 149 123 147 
38 10-2 15-3 150 129 125 138 141 127 140 
19 11-2 15-10 140 120 134 137 150 127 144 
25 9-5 14-9 157 136 139 149 160 138 155 
39 10-11 | 15-2 139 123 129 139 152 126 141 
43 10-7 14-534 137 125 116 140 147 121 144 
47 10-0 14-7 146 142 129 141 153 136 147 
41 10-3 14-11 146 115 120 135 146 118 141 
16 11-6 15-11 138 118 121 150 156 120 153 
15 10-11 | 16-2 148 135 127 147 144 131 146 

Constant 
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in order to enable joint use of the two separate rsearches by other 
investigators, for possible ends not now foreseen. 

It is clearly evident that there is a large constant error of under- 
prediction in Herring-Binet, whereas Stanford-Binet shows no signifi- 
cant constant error. Moreover, the discrepancy between Herring-Binet 
and the criterion of validity is virtually the same as that between Herring- 
Binet and Stanford-Binet, both in sign and in amount. As practice 
takes effect, on repeated Herring and repeated Stanford, the large 
minus error of the former is slightly reduced, while the Stanford 
develops a slight plus error. 


Taste III.—Snowine THat THE UNDERPREDICTION OF HERRING-BINET 18 AS 
GREAT FoR HiaHerR As FOR LOWER, WITHIN THE RANGE ABOVE 135 IQ 
(S-B), Ustnae Composite ScHoLastic ACHIEVEMENT AS THE 
CRITERION OF VALIDITY 





En Eee | ~~ 








Higher fifteen _ Lower fifteen 

IQ Error of | Error of IQ __| Error of | Error of 

Child | average | average | average Child | average | average | average 
S-B:,2 | H-B:,2 S-B:,2 S-B:.. | H-Bi,2 | S-B:,: 
2 178 —21 +13 31 147 | —16 + 8 
13 176 —39 —17 15 146 -—17 — 2 
8 172 — 6 — 2 46 146 | —10 - 1 
10 172 —15 +16 39 146 | -—13 + 7 
3 171 -— 9 + 7 43 144 | —16 + 7 
30 170 —27 -—1 19 144 | —15 + 2 
11 166 —ll +12 21 144 —16 —- 3 
44 164 -— 8 + 6 50 144 | —12 +13 
26 163 —13 +12 51 - 143 0 +1 
20 162 —23 — § 49 143 | —20 — § 
27 162 —15 +11 45 140 | -—17 + 4 
36 161 — 7 +8 41 141 —28 — § 
24 160 —10 +10 38 140 | —23 —10 
9 160 —36 — 6 40 136 | —21 - 9 
7 160 —18 + 2 29 137 | —27 —11 

Constant Constant 
error.. ae —17.2| + 4.4 || error... cee —16.7 — 0.3 





























In order to gain insight into the possibility that the error of Herring- 
Binet might increase with increasing degrees of deviation, within the 
range being studied here, we calculated the fifteen highest IQ’s and 
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the fifteen lowest, in our group, averaging in each case the two Stan- 
ford-Binets to determine “‘highest” and “‘lowest.”” We then averaged 
the two Herring IQ’s, and calculated the errors of the latter as com- 
pared with the criterion, for these two groups. Table III shows the 
error of the Herring to be about the same for the two. Above an IQ 
of 135 (S-B), the Herring does not systematically increase its error. 
It errs as much for children of 145 IQ as for those of 165 IQ. 


RELIABILITY 


The variable errors reveal that Herring-Binet arrives at its large 
minus error about as reliably as Stanford-Binet arrives at approxi- 
mately zero error (CE). Neither is more erratic than the other in 
rendering its own typical performance, the one unfaithful, the other 
faithful to the criterion. 


Average errors (regardless of sign) are much larger for Herring- 
Binet. 


PossisLE CAUSES OF THE HERRING-BINET ERROR 


It is not the aim of this discussion to determine in a final manner 
the exact causes of the under-estimation of the intelligence of the 
gifted by the Herring-Binet. The fact remains that such under- 
estimation does exist, and that in spite of Herring’s remarkable cor- 
relation (achieved by working on statistical assumptions) with the 
Stanford, of .991. Of course, even with such a high correlation as 
this it would still be possible for the two tests to diverge widely 
at the extreme ends. It might be suspected, though not proved, from 
our data that Herring-Binet is weighted for influence at the central 
tendency beyond the facts of human nature thus in measuring them, 
drawing deviates at both extremes in toward the mean. The extreme 
deviates are so few in number that the central tendency would not be 
appreciably affected by them. On the other hand, it is possible that 
the Herring runs consistently lower than the Stanford throughout the 
whole range of intelligence. If that is so the correlation can still be 
very high, even though the Herring is arriving at error. 

It is not yet safe to substitute statistical assumptions for a first 
hand study of human nature. Keeping in mind the failure of the 
Herring as a valid measure of the gifted, it will be well to recall the 
difference in the approaches made by Herring and by Terman in 
revising the Binet-Simon scale. Herring’s attack was mainly statis- 
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tical. His examination was based on but one hundred fifty-four 
children, including the whole age range, and the results were treated 
mathematically until they became rather thin. Terman went out 
into the field and worked with flesh-and-blood subjects. To quote 
him: “‘ Our revision is the result of several years of work, and involved 
the examination of approximately 2300 subjects, including 1700 normal 
children, 260 defective and superior children and more than 400 
adults.”! Three successive revisions were made by him. When the 
scale of tests was finally completed, it had as its foundation the 
actual accomplishments of actual minds. Hence it is not surprising 
that in a situation such as the one with which we are dealing the 
Stanford Revision should prove to have a validity not possessed by a 
revision based chiefly upon statistical assumptions. 


EXAMPLE OF FaLLAcy FOLLOWING FROM USE oF HERRING-BINET 


The outcome of using Herring-Binet without knowledge of its 
adequacy with deviates, appears to be exemplified in the researches of 
Crabbs.? In measuring children with a view to determining the effi- 
ciency of teachers and of supervisors, Crabbs, contrary to other 
investigators, found a positive correlation between intelligence and 
RAR (accomplishment ratio in reading). ‘‘ Practically every investi- 
gation that has been made since the AR formula was developed shows 
that dull children tend to have higher AR’s than do bright children. 
[In our investigation] the subject of reading reverses the common 
expectation.” Crabbs also says of her results, ‘‘Contrary to con- 
ventional opinion, the teacher of bright pupils is benefitted [in the 
ratings] unless these pupils have high mental ages” [are old]. 

The intelligence tests used by Crabbs were the National Intelligence 
Test and Herring-Binet. The investigator does not give results from 
these tests separately; so we cannot tell how many of the [Q’s entering 
into the reckoning were determined by means of Herring-Binet. But 
to whatever extent this test was utilized to measure plus deviates 
(and very possibly this is true for minus deviates, as well), to that 
extent we must evidently expect the atypical results which were dis- 
covered by Crabbs; for the very bright (as measured by H-B) are 
really much brighter than they jare recorded as being, and hence 


1Terman, L. M.: ‘‘The Measurement of Intelligence.” Houghton Mifflin, 
Boston, 1916, pp. 51-52. 

2 Crabbs, Lelah M.: ‘‘Measuring Efficiency in Supervision and Teaching.” 
Teachers College, Columbia University Contributions to Education, No. 175, 1925. 
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“‘accomplish”’ more than would be their quota by Stanford-Binet, 
whereas the dull (if the invalidity of H-B holds also for minus deviates) 
would be actually duller than they appear by their IQ’s on Herring- 
Binet, and hence would ‘‘accomplish”’ less than would be their quota 
by a really valid test of intelligence. This would result in positive 
correspondence between amount of IQ and amount of AR, which would, 
of course, be in contradiction to previous results. Previous results 
have been founded, as the literature will show, largely upon Stanford- 
Binet, and other tests similarly standardized on actual children. 

It is thus clear that criteria of ability to teach, founded upon 
Herring-Binet, will show teachers of young gifted children to be effi- 
cient beyond their actual merits; whereas teachers of the dull may be 
mistakenly penalized. Although Crabbs sought with well-directed 
logic to explain her peculiar results, the explanation here made avail- 
able did not occur to her. 


CONCLUSIONS 


1. Herring-Binet is not an alternate for Stanford-Binet, insofar 
as gifted children are concerned. On the average the former rates 
gifted children about seventeen points lower in terms of IQ than does 
the latter. 

2. Invalidity rests with Herring-Binet, since when the criterion 
of subsequent scholastic success under conditions of full opportunity 
is applied, Herring-Binet makes, on the average, a minus error of pre- 
diction, amounting to about eighteen points of discrepancy between 
IQ and EQ. 

3. Herring-Binet and Stanford-Binet are about equally reliable 
in arriving, the former at error, the latter at truth, in the case of a 
group of gifted children. The average error (average amount of 
falling away from the criterion regardless of direction) is, however, 
much greater for Herring-Binet. 

4. The results of the present study are limited to the case of 
extreme plus deviates. Whether and in what amounts the invalidity 
of Herring-Binet may extend to other sections of the frequency surface 
is not revealed by our data. 

5. The inference emerges that instruments for mental measurement 
devised upon the basis of statistical assumptions only, should be 

regarded without confidence until validated by trial with populations. 
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INCIDENTAL LEARNING 
RAYMOND R. WILLOUGHBY 
Clark University 


(Concluded from December Issue) 


SUBSTITUTION AND RECALL ON DIFFERENT EDUCATIONAL LEVELS 


Several ways of attacking this problem are available; comparison 
might be made of the central tendencies and variabilities of each of 
several groups widely separated in educational advantages, either 
without or with selection on the age basis as a preliminary; group 
divisions of the same sort might be made and the aid of contingency 
or tetrachoric methods invoked; or a single product-moment coeffi- 
cient might be computed for the entire group, irrespective of age, 
and the group subsequently analyzed in greater detail. The last 
method has been tried, with somewhat startling results: 


T substitution score—school grade = 797, PE .015 


At any rate, there seems little doubt of the fact of significantly 
high correlation; and inasmuch as the score distribution is truncated 
at the top, the true correlation is probably even higher. Although 
the zeta-test for linearity was not applied, the distribution appears 
from the scatter a trifle curvilinear. 

Just what is being measured here, however, is not so clear. In 
view of the character of the test, it seems at least unlikely that it is 
any considerable effect of school training (beyond the ability to 
manipulate a pencil, etc.) on ability to substitute symbols for digits. 
A more reasonable hypothesis would seem to be that the function 
measured is largely age (7.e., mental maturity) and partly the survival 
value in school of the mentally quick, who are also able to make good 
scores in the test. The accurate isolation of these factors and others 
is a problem in partial correlation involving measures not at hand, 
but some estimate can be arrived at by correlating the same variables 
for basics only, thus eliminating the age factor: 


T substitution score—school grade, basice = -022, PE .056 


The latter factor mentioned, therefore (the survival value in 
school of mental quickness) is therefore the more important one. 
Again, it is probably even more important than the figures indicate, 
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since the reduction of the variability through the removal of all 
grades below the third and above the tenth has reduced the correla- 
tion; correcting for this truncation, we get a better value for this 
coefficient : 


T score—grade, basics, corrected for truncation = .986, PE .029 


Ruling out the age factor, therefore, and keeping the total number 
of grades actually represented, we find almost a perfect correlation 
between substitution ability as measured by this test and school 
grade; so that the suspicion is strong. that the test enables a good 
estimate of whatever traits make for success, or at least progress, in 
school. 

In the matter of recall, the distribution is again truncated at the 
top, but no method exists for estimating the amount of truncation; 
accordingly the obtained value of 


T recalled elements—school grade = .686, PE .022 


is probably somewhat too low. Both functions, then, are highly 
and significantly correlated with school grade, the implications being 
that both quickness and plasticity figure largely in survival and 
progress in school. 


ConpITIons ACCOMPANYING EsPEcIALLY Goop or ESPECIALLY 
Poor ABILITY IN THE FUNCTIONS 


Ezxcellence.—‘‘Good ability” in the substitution function has 
been defined arbitrarily for present purposes as the ability to substitute 
correctly (except that “‘N’”’ is allowed for 2) eighty or more elements 
on the test in the time allowed; similarly, “good ability” to recall is 
the ability to write from memory eight or all of the nine symbols 
corresponding to the digits. Going through the data for individuals 
possessing these grades of ability, we find twenty-seven who possess 
good .substitution ability, seventy-eight good recall ability, and 
twenty-seven (additional) with both. Evidently, then, the two 
abilities are by no means closely correlated and, particularly, good 
recall ability in nowise depends on good ability to substitute. Fifty 
per cent of those having good substitution ability have also good 
recall ability, but thirty-nine per cent of those having high recall 
ability had good substitution ability. The tendency is for excellence 
in each alone to be a feminine function, but for excellence in both 
to be a masculine one. Of the one hundred thirty-two individuals 
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having some sort of good ability, fifty were men; the corrected value 
for the females (to allow for the preponderance in the population of 
three hundred) is sixty-six. The corresponding values for the ability 
groups are: 


M F F, CorrecrTep 
Good substitution only 6 21 17 
Substitution and recall 14 13 10 
Good recall only 32 46 37 


Age analysis of these good ability groups brings out very strikingly 
that incidental learning is a function of youth; the mean age of the good- 
substitution group is thirty-one years, of the substitution and recall 
group twenty-four years, and of the good-recall group seventeen 
years. This being a somewhat important point, the distributions 
are given in full: 


Good Substitution Only 
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The mean recall score for the twenty-seven individuals making 
high substitution scores was 5.5. The mean substitution score for 
the seventy-eight individuals having recall scores of eight or nine 
was sixty-three. These are both decidedly mediocre scores, indicating 
again that the ability to substitute well, or to form incidental associa- 
tions well, is not necessarily related to the other of these two abilities, 

Inferiority — Poor ability” in these functions may be defined, 
also arbitrarily, as ability to make a score of forty elements or less 
in substitution, and three elements or less in recall. The poor-ability 
group in substitution alone then numbers twenty-three, the poor- 
recall group thirty-two, and the group poor in both nineteen; 7.e., 
forty-five per cent of those unable to substitute well were also unable 
to recall well, and thirty-seven per cent of those unable to recall 


Incidental Learning 15 


well were also unable to substitute well. The mean of elements 
recalled by those having a low substitution score was six—higher 
than for the group making high substitution scores; the mean substitu- 
tion score for those recalling few elements was 57.5, only somewhat 
lower than for those recalling most elements. All indications point, 
as before, to the independence of the two functions; a correlation 
would probably show the same thing. 
There are no significant sex differences in inferiority: 


M F F, Conrecrep 
Poor substitution only 10 13 10 
Substitution and recall 8 11 9 
Recall only 16 16 13 


The interpretation of the mean age data offers difficulties. Perhaps 
the best hypothesis is to suppose the difference between the poor- 
recall and substitution and recall groups to be due to chance, and its 
advance over the good-ability groups to indicate increasing difficulty 
in both functions with age; and the low age of the poor-substitution- 
only group, which can hardly fail to be significant, may show that 
recall functions well in youth in spite of the fact that in this group 
the substitution ability has not been well established; the means are: 


iss ike ik hoa 0 0c rtn-y <b Gudea tew 4 oss &e 33 years 
Poor substitution and recall...............00ceeeeeeees 35 years 
cared ccevecwideredeeeheesioves 22 years 


It should be borne in mind, however, that these means are some- 
what misleading, as both functions are low at both ends of the age 
scale, and the “‘age-poor ability’ distributions are therefore bimodal. 


THE INFLUENCE OF SUGGESTION ON RECALL 


An opportunity to investigate the effect of suggestion in recall 
is afforded by the form of the symbol corresponding to two, which is 
that of a mirrored ‘‘N”’. The confusion of this symbol with a true 
N may be referred to as the ‘‘ N-error,”’ and may be considered as a 
measure of suggestion. Such an error occurred in some form in 
one hundred eight cases out of the three hundred, or about thirty-five 
per cent. These were of four classes, roughly grouped: (1) The 
“‘defective-observation” type, where suggestion was perfectly effica- 
- cious; 7.e., the subject wrote every two in both substitution and 
recall as an N, never, apparently, having perceived any discrepancy; 
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(2) the “‘resistance”’ type, where a struggle to overcome the suggestion 
was evident in the substitution (as by writing some correctly and 
others as NV), but suggestion triumphed in the recall; (3) the “delayed” 
type, where all two’s were written correctly during the substitution, 
but in the recall, the key no longer being present for comparison, the 
N-error appeared; and (4) the “‘overcome”’ type, in which difficulties 
appeared in the substitution (as in the “resistance” type) but the 
symbol was correctly recalled. The last was the most frequent type, 
and seems to be prevailingly a masculine characteristic; no other 
significant sex differences are evident: 


DEFScTIVE 


OBSERVATION RESISTANCE DELAYED OvERCOME Tora 
Male 8 11 10 27 56 
Female 7 12 13 20 52 
Female (corrected) 6 10 10 16 42 
Total 15 23 23 47 108 


An examination of the mean ages of the groups making the four 
types of N-error shows little difference between the resistance and 
delayed types (25.5 and twenty-seven years respectively), these being 
presumably of the same sort psychologically and differing only in 
degree. These groups, however, are older, and probably significantly 
older, than either the group which did not succumb to the suggestion 
though affected (20.7 years) or the group which never perceived the 
discrepancy (twenty-two years). This is not easy to explain (unless 
as a chance variation) but suggests the presence of a brighter and a 
duller group of children or better, perhaps, groups of higher and lower 
suggestibility respectively, which may or may not be correlated 
with ability. So far as substitution ability is concerned, there is 
perhaps slight evidence in the fact that the mean score for the “‘over- 
coming” group is fifty-nine and for the “defective-observation”’ 
group fifty-six, though the mean ages are reversed. The probable 
true state of affairs, however, is indicated by the fact that the twenty- 
six-year-old groups have a mean score of sixty-six, while the twenty- 
one-year-old groups score fifty-seven, i.e., the really significant factor 
is age, as before, not suggestibility. The age-score means are best 
presented in a brief table: 


DEFEcTIvVE 
OBSERVATION RESISTANCE DELAYED OveRcomsE 
Age 22 22.5 27 20.7 
Score 56 67 65 59 
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TRANSMISSION OF THE ABILITY TO SUBSTITUTE AND RECALL 


The number of cases available in the present data for the study of 
transmission in the abilities under consideration is rather small, 
and becomes even smaller when certain desirable deductions have 
been made. ‘These are chiefly of two sorts: (1) Those for age; it has 
been shown that the means of the age arrays decline regularly with 
age, with the exception of the array in the neighborhood of thirty- 
three years, which probably represents parents of inferior intellectual 
status. Accordingly no parent has been chosen whose age is below 
thirty-five or above fifty; the means in the group so selected are 
probably not so widely different but that a fairly valid comparison 
may be made. (2) No parent (or child) has been chosen whose score 
in substitution was perfect, since the true score in such cases is not 
known. The same precaution should have been taken ideally in the 
case of the recall, but no worthwhile study of recall would have been 
possible had so large a number been omitted. After these deductions 
were made (it may be remarked that their effect is to lower the corre- 
lation below the probable true correlation) the number of ‘‘ complete” 
family groups (basic, mother, father) left for study was nineteen, 
besides several more consisting of the basic and one parent. These 
were studied by the correlation method in some detail for the substitu- 
tion ability, and the three most significant coefficients in recall com- 
puted in addition. They show the same type of non-sex-linked 
transmission observed by Miss Margaret Cobb in her study of the 
“inheritance” of arithmetical ability by the same method—wiz., 
the child appears to resemble in his ability one parent or the other, 
but not both; and there is no ascertained relation having to do with 
transmission between sexes or in the same sex. - 

The most general coefficient is that between basic and average of 
the two parents. It is: 


Tresic—midpareat, substitution = .0¢, PE .13 
Thasic—midparent, recall = -30, PE .13 


The similarity of these coefficients is worth noting; they seem to 
indicate that the two abilities (whose independence of each other is 
shown in other connections) are nevertheless transmitted in about 
the same way. The number of cases here is nineteen. A similar 
correlation was found between basic and father (twenty-five cases) 
and boys and fathers (eleven cases), but aside from these somewhat 
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questionable ones there is no trace of anything significant in any of 
the sex relationships: 


PE Cases 
Tbasic—mother, substitution = .12 .10 44 
Tboy—mother, substitution = — .03 .14 22 
Tgirl-mother, substitution = .20 .14 22 
Tbasic—father, substiiution = .37 .12 25 
T boy—father, substitution ~—  .- 39 17 11 
Tgirl-father, substitution = .10 .18 14 


Miss Cobb’s method of correlating basic with more similar parent 
gives the only high or significant correlations of the group. They are: 


Tbasic—more similar parent, substitution .69, PE .08 
Tbasic—more similar parent, recall = . 56, PE .11 


These are based on the same nineteen cases, and again their com- 
parative closeness is worth noting. Here, however, a rather important 
point makes its appearance. A priori, it is not unlikely that a moder- 
ately high correlation might be secured by comparing the first of any 
three series, taken at random, with the nearer of the other two. Accor- 
dingly the experiment was tried. Each basic in the list was correlated 
with the mother of the basic next after him in the list (list alphabetical, 
except for later arrivals) or with the father of the basic second 
from him, whichever he was most like. The resulting coefficient is 
insignificant (r = .21, PE .15), so that the method may be regarded 
as approximately valid subject to further verification.! The corre- 
sponding coefficients for less similar parent show complete absence of 
relationship, both in the true and the chance series: 


Tbasic—less similar parent, substitution ™ .07, PE .15 
Tbasic—less similar parent = —.14, PE .13 
Tbasic—lees similar parent, recall = . 16, PE .15 


The marital correlation (between father and mother) the effect 
of which is to raise the parental correlation and others slightly, is of 
doubtful significance here, but probably not greatly effective (r = .24, 
PE .15). 





1 A later attempt to repeat the verification on other tests of the battery failed. 
I am uncertain as to the exact logical status of this method, but the coefficients 
are presented as found for what they may be worth. 
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SUMMARY OF RESULTS 


1. There is a low, but so far as it goes, significant relationship 
(r = .37) between ability to recall associations formed incidentally and 
the amount of practice on such associations (within the limits of the 
experiment). 

2. There are nowhere any significant sex differences in either 
ability to recall or ability to substitute in the original test. 

3. There are striking and regular variations in both functions 
with age. Ability seems to rise sharply and regularly from the 
earliest age at which a score is possible to a mode at about twenty-one 
years, and then to decline gradually to old age. 

4. Differences due to nationality are uncertain; English and 
‘“‘American” groups appear to resemble each other, however. 

5. The principal condition of good ability in recall is youth (age 
10-25 or thereabouts); good ability to substitute is not a condition 
of ability to recall, and neither, probably, is sex or school grade (except 
as measuring age). The factors making for good ability to substitute 
are not clear, but here also probably the chief ones are youth and 
intelligence. Inferiority in these abilities is not easy to evaluate, but 
probably is due to very low or high age and inferior intellectual 
status. Heredity is also possibly a significant factor. 

6. Suggestibility in recall is rather effective; of the various types 
of suggestibility, low resistance is probably significant of low mental 
age, and temporary resistance, with later succumbing, of higher 
chronological age. 

7. Ability to substitute is highly indicative of general mental 
ability as measured by this battery (r = .80); but there is total 
absence of relationship between general mental ability and ability 
to recall. 

8. There is a high and significant relationship between school 
grade (highest school grade reached) and both ability to recall and 
ability to substitute; indications are strong, however, that this is 
almost exclusively due to the survival and progress value of the abili- 
ties (or others correlated with them) in school work, rather than to 
any effect of education on them. 

9. Work on a limited number of cases suggests that children 
resemble one parent or the other predominantly, but not both; but 
that the sex of this more similar parent is not significant, nor is that 
of the child to whom the ability is transmitted. The hypothesis is 
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perhaps worth investigating that there is Mendelian inheritance, not 
sex-linked. 


BIBLIOGRAPHY 


Barrp: ‘“Réle of Interest in Mental Functioning.’’ Philosophical 
Essays in Honor of James Edwin Creighton, 1917, pp. 307-317. 


Baird’s contribution is chiefly in the nature of a summary of work 
done, with comments of his own on its significance. His thesis is 
the usual one that we learn only that which we intend to learn, and 
he cites in illustration the different possible intents in reading—the 
material, the typographical accuracy, etc. ‘‘Certain stimuli find 
access to consciousness, while others find their entrance barred.’ 
Myers is cited as concluding that students observe only that which 
they intend to observe; Kiilpe, that the feature of a complex to which 
attention is directed is observed with “two-fold”’ (!) greater accuracy 
than those not emphasized, but that some features are inherently 
more intent-directing, 7.e., interesting. Radossawljewitsch is referred 
to as observing the case of a subject who read a series of nonsense 
syllables forty-six times, without understanding (on account of 
language difficulty) that he was to memorize them, and showed less 
ability to recall them than subjects who had read them ten times 
with intent to remember. Aall finds that delayed reproduction 
(for one month) was more accurate when the subjects had learned 
with the intent to remember for an extended period than when they 
had intended only to remember for a few days. Meumann also finds 
that differentiation in intent in learning leads to differentiation in 
effect of the learning. The writer concludes with some reflections on 
the importance of intent: “flight of ideas’’ is thought ungoverned by 
intent; obsessions are thought dominated by inflexible intent. Intent 
is the selective factor making a choice between the multitude of 
associative bonds possible, and therefore enabling purposive action— 
much like Thorndike’s “set.” 

The article is not of prime relevance but is useful as a short 
summary of the most important work done. 


Myers: A Study in Incidental Memory. Archives of Psychology, 
1913, Vol. IV, pp. 108. 


A large amount of knowledge, says Myers, is acquired incidentally ; 
but many familiar relations are never learned, for the conditions of 
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accurate recall are that the original perception be made with respect 
to experience and utility. His historical chapter points out the rela- 
tions of the subject particularly to the psychology of testimony, and 
also to suggestibility and advertising. The experimental findings 
are roughly the following: The sizes and proportions of familiar 
objects (dollar bill, coins, etc.) are underestimated or overestimated 
almost universally. The error decreases with increasing age and 
experience, and is in general greater for females. With recall of words 
presented in another connection, there is little age difference between 
children and college men for immediate recall, but the loss in efficiency 
due to a lapse of time is greater in the lower ages; women in general 
are more efficient than men. With the “letter square,”’ a complex 
in which the attention is directed to one component and recall of the 
others required, there is very low accuracy in nearly all details. Here 
again females are superior to males. There is also low efficiency in 
several auxiliary tests, such as observation of a familiar watch dial, 
the dates of familiar events, and the rapid estimation of letters in a 
word. Myers draws the conclusions that the school is fulfilling its 
function best when it teaches the child to study rather than to recite, 
and that in court the preponderance of evidence may be quite 
erroneous. 

One is impressed with the fact that barring the first admission, 
repeated at the last (that much knowledge is acquired incidentally), 
no positive aspect is emphasized. The study seems carefully done, 
but the impression remains that it could be rather easily attacked on a 
statistical basis. 


BoswELL and Foster: On Memorizing with the Intention Per- 
manently to Retain. American Journal of Psychology, 1916, 
pp. 420-426. 


This is a Cornell study of no great present bearing. Four subjects 
were used, and the material consisted of association-pairs of Chinese 
words and their English equivalents. There were two series, differing 
only in instructions—one for permanent retention, the other for 
temporary retention; and the members of the two series were some- 
times given in the same hour. Apologies are made for the unreliable 
nature of the introspections secured from the comparatively untrained 
subjects. In the reviewer’s opinion they are more needed for the 
probable errors involved and the lack of controls. Furthermore the 








. a 


ee ale 








22 The Journal of Educational Psychology 


differences found were rather slight. Nevertheless, the experimenters 
believe that within the limits of their experiment (7.e., in vocabulary 
material) the intent to learn for permanent retention actually brings 
about such retention. Some German work is cited in addition. 


Brown: Incidental Memory in a Group of Persons. Psychological 
Remew, 1915, pp. 81-85. 


This is a study from the University of California, comparing in 
rather simple and effective manner the quality of the material remem- 
bered by persons with that remembered by a group. The subjects 
were one hundred seventy-five students, who were asked to write 
down the names of all the advertisements they could remember in a 
given time. It was found that the items mentioned by most students 
tended to be first remembered by each student; that is, that the func- 
tion is qualitatively homogeneous for a group. The converse is also 
true, tending to strengthen the conclusion. Brown says that there 
is evidence from his investigation that poor incidental memory (i.e., 
short lists) involves no abnormality other than general poverty, 
or ‘‘weakness,”’ of association. The study was checked some time 
later on another large group, and the original findings amply confirmed. 
The size of the group studied makes the reliability of the results 


fairly high. 


ORDAHL: Consciousness in Relation to Learning. American Journal 
of Psychology, Vol. XXII, 1911, pp. 158-213. 


This is a long and carefully worked out study of much merit, but 
only parts of it bear upon the matter of incidental or unconscious 
learning; those parts which do, however, are exceedingly illuminating 
(see section on Subconscious and Unconscious Learning). Mrs. 
Ordahl considers that unconscious learning is largely a physiological 
matter; she admits, however, the factor of “instinctive imitation”’ 
and cites instances to show its efficacy. Numerous classic laboratory 
experiments are drawn upon to show the maturation of a habit apart 
from its use, e.g., in the scattering of practice. In learning of any 
sort, she says, there are conscious and unconscious factors; learning 
can progress without consciousness of the end or of the fact that one is 
learning, but a high degree of attention brings more marked results 
than work under distraction. 
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Kim KPATRICK: An Experiment on Memorizing vs. Incidental Learning. 
Journal of Educational Psychology, Vol. V, 1914, pp. 405-112. 


Kirkpatrick endeavored to obtain evidence as to whether the 
multiplication table (and by implication school material in general) 
might be better acquired through practice, incidentally, or memorized 
and then applied. He worked with normal school students and 
children, the material comprising the products of the numbers from 
seventeen to fifty-three by seven. One group practiced writing the 
products (after a preliminary test to equalize the rates) without the 
knowledge that they were products; the other memorized them as 
products for five or six days and then began to write. The memorizers 
were better on the second day of their substitution, but had fallen 
behind by the tenth day. After an interval of two weeks, a short 
review was given both groups, and they then wrote as many products 
as they could in two minutes, without the key sheet; the memorizers 
averaged 40.9 products, the practisers 46.2. In another experiment, 
the relative merits of practising as before and of computing products 
as they went were investigated with groups of young women. The 
final test yielded averages of 25.4 for the practisers and 44.3 for the 
computers. In a third rather irregular experiment with children 
the superiority of computing to practising was rather definitely shown, 
and likewise a tendency for children who had practised and memorized 
to be superior to those who had only memorized. In conclusion, the 
author states that there is little evidence of any advantage gained by 
memorizing unassociated with practice; and there is marked evidence 
of the efficiency of using previous knowledge as the individual is 
learning. As practical applications in educational practice he points 
out that nowadays the alphabet is taught quite as effectively inci- 
dentally while learning to read as it was formerly by a prolonged 
drill period; and suggests that it is quite likely that it is best to learn 
products incidentally and by computing them, rather than by drill 
alone. 
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AN INQUIRY INTO THE STANDARDIZATION OF THE 
FERGUSON FORM BOARDS* 


E. D. MacPHEE 


Associate Professor of Psychology, University of Toronto 
AND 


A. J. BROWN 


Mental Hygiene Division, Department of Health, Toronto 


In 1920,! Professor Ferguson reported the construction and tenta- 
tive standardization of a series of Form Boards. The series consisted 
of six boards of equal size, with all the holes of geometrical shapes, 
but with no two holes alike. The norms published by Ferguson 
showed constant grade increments from Grade I of the public school 
to the third year of university. It required as a maximum thirty 
minutes to administer and with the older children this time was very 
materially reduced. The series seemed to be a promising one for 
clinical use and an enquiry into its standardization was undertaken. 
The reader is referred to the original article for details of construction 
and for instructions for giving and scoring. 

The grade-norms published by Ferguson are of very little clinical 
value. The senior author in a study now in progress finds that within 
each grade in a typical school in this city there is an age spread (both 
chronological and mental) of four to six years. A statistical average 
under such circumstances is a pure fiction. Ferguson gives no evidence 
as to how his sample was obtained, or the safeguards he adopted to 
make the sample a reliable one. In the absence of any objective 
measure of grade-standing, it is quite impossible to assume that the 
term “grade’’ connotes any very specific attainment—intellectual or 
educational. 

For our enquiry we adopted the following principles: 

1. A random sample of children of certain age groups was to be 
tested by the same examiner. 

2. The examiner (junior author) was to practice giving and scoring 
the tests with a group of children not to be used later in the study. 





* The investigation here reported was carried out by Miss Brown, under the 
supervision of the senior author and was submitted in partial fulfilment of the 
requirements for the degree of Master of Arts, in the Department of Psychology, 
University of Toronto, June, 1927. 
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3. The instructions for administering the test, as cited by Ferguson 
in his original article, were to be followed explicitly. 

4. The norms to be established in this enquiry should be chrono- 
logical age norms. The presence of age-increments was taken as the 
fundamental criterion of validity. 

5. Should the Ferguson Scoring rules fail to give clear age-incre- 
ments, alternative scoring techniques should be employed with a view 
to obtaining if possible clear-cut age norms. 

6. Every Nth case at each age level should be re-examined after a 
fixed interval, to afford evidence of reliability. 

7. Group verbal intelligence tests should be given to all pupils 
examined with the Ferguson series. If correlations should turn out 
to be high and reliable the Ferguson Series might be considered an 
alternative to a linguistic test for educational classification. If the 
correlation were low such a‘recommendation could not be made. 


SELECTION OF RANDOM SAMPLE OF PUPILS 


This was accomplished by a series of selections. 

1. Two typical schools were selected. The basis of this judgment 
was the per cent of mental defect in these schools. The Mental 
Hygiene Division of the Toronto Public Health Department has made 
continuous surveys in the city schools for several years and data were 
available as to incidence of mental defect in each of the city schools, 
and in the school population as a whole. It was assumed that dif- 
ferences in the per cents of subnormals was an adequate measure of 
differences in the general intellectual level of the school population 
and two average schools in different sections of the city were chosen. 
Within these two schools a selection was made of children of certain 
age groups. 

2. At seven years of age a considerable number of children are not 
yet in attendance at public schools and at thirteen years and above 
many of the brighter children have'been promoted to high schools and 
collegiates. It is possible then to obtain in a public school a completely 
random sample of only the eight, nine, ten, eleven and twelve year 
groups. We obtained at the beginning of each week a list of the names 
of children who in that week would be within one month of their 
birthday, above or below, irrespective of the grades in which they 
were found. All such children were examined. No pupil included 
in our reports falls outside this age group and no child in these age 
groups was excluded from our data. 








j 
2 
: 
‘i 
4 
P 





ie RMT RA TIPS 





26 The Journal of Educational Psychology 


It could be fairly assumed, we believe, that these precautions 
gave us a random sample of public school children in this city of the 
age groups selected for study. The age-grade distribution of cases 
examined is shown in Table I. It is recognized that the number of 
cases is too small to warrant definite conclusions, but it should be 
sufficient to indicate trends. In other words if the Ferguson Form 
Board is a valid performance test age-increments should be expected 
to appear in scores of the groups tested. 


TABLE I.—AGE-GRADE DISTRIBUTION! 








_—— Junior| Senior} Junior} Junior| Senior} Junior| Senior Auxiliary | Total 
I I II III III IV IV 
EES 10 14 bi a bi a re ~" 24 
ER i 11 11 4 os * a i 25 
a = 2 16 8 1 ie - 1 28 
idehnett ad - a 5 13 5 1 ba 1 25 
PR: c “5 4 7 11 7 3 a 32 
Total.....| 10 27 39 28 17 8 3 2 134 
































1 The elementary school course in Ontario comprised seven grades, each repre- 
senting one year of work. Junior I is then the equivalent of Grade I, Senior I the 
equivalent of Grade II, etc. 


Types or Data COLLECTED 


1. Time Record.—The time required by each pupil to complete 
each board was noted in seconds. No pupil was allowed to continue 
at any board longer than five minutes. A special record blank was 
prepared for recording individual results. 

2. Scores.—Ferguson provides a five point score for transmuting 
time taken per board into a point score. His table is as follows: 





TaBLeE II 
Time in seconds............. 0-29 | 30-59 | 60-99 | 100-149 |150-300;| 301 
SSIS “Ei 9a ee oO 5 4 3 2 1 0 























The time record of each pupil on each board was transmuted into 
the appropriate scores in accordance with Table II. 

3. Retests—The scores made by each group on the initial test 
were arranged in a distribution from highest to lowest, and every 
fifth case was retested after a lapse of not less than six weeks and 
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not more than two months. The retest scores were obtained and the 
simple correlation of the first and second scores were computed by 
the product-moment method. 

4. Verbal Intelligence Tests —The National A-l1 was given to 
pupils in Grades III to VII and the Pintner-Cunningham, Form A to 
pupils in Grades I and II. Correlations of total scores in each case 
with scores in the Ferguson Series were computed. 

5. A technique for the analysis and recording of comments on the 
performance of subjects during the test was devised. Record Blank 
No. 2 was used for this purpose following an initial period of observa- 
tion and classification. This was undertaken because of the frequent 
comment in the literature that performance tests allow the examiner 
to observe certain supplementary data, e.g., attitude, speed of move- 
ment, stability and so on. 


RESULTS 


It seems advisable for purposes of clarity to present the results 
under three main headings: 

1. Results obtained with the Ferguson rules for scoring time records. 

2. Alternative methods of scoring time records. 

3. A qualitative analysis of performance. 


FERGUSON SCORES 


Time records on the initial test were scored and distributed on age, 
grade and sex bases. 

A. Age Scores.—Table III gives the median scores, with average 
deviations, for five age levels—8, 9, 10, 11, 12—on each of the six 
boards and for the total series. No difference appears in the average 
scores of the eight and nine year groups. ‘There is an increase at 
the ten year level of two points but the twelve year scores show no 


Taste III.—Mepian Scores AND AvERAGE DeviaTIONS FOR AGE LEVELS, 
FerGuson Merson or ScorineG 



































Boards 
Number 1 2 3 4 5 6 Total 

Age of cases 

8 24 4.0+ .71/2.7+ .9|1.0+ .8| .64+ .5|.08 + .1] .2 + .3 8.6 + 2.2 

g 25 4. + .7;2.4+ .11;1.8 41.0} .76+ .6| .04+ 1 .04 + .08;) 8.5 + 2.6 
10 28 4.2+ .6'3.1+ .9'1.6+ .9]} 1.0+ .6; .28 + .4,;1.5 41.3 | 10.6 + 2.3 
ll 25 4.1+ .8/3.6+ 6);2.0+ .8| 13+ .7| 64+ .6| .44+4 .5 |12.4 + 2.8 
12 32 4.3+ .71;3.4+ .6|2.14 .1}] 1.4+ .8] .6 + ‘6 4 + .6 112.5 + 3.5 
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increase over those of the eleven year group. An examination of 
the records made on the individual boards reveals the fact that the 
first board is not discriminative at age levels, the average for the 
five ages being approximately constant. There is an increase in 
scores in each of the other boards but it is so gradual that it is impos- 
sible to assign age growth to any individual board. 

B. Grade Scores.—Table IV gives the median scores, on the total 


series, for grade levels. ‘The median scores show increases at each 


grade level up to Junior III, with no increase thereafter, but little 
reliability can be placed in this result because of the large deviations 
and the small number of cases in certain grades. In the early grades 
our medians are slightly higher than Ferguson’s. 


TaBLeE IV.—MEDIAN ScorES FOR GRADES BY FERGUSON MeETHOD or ScORING 








AM AD Number of cases 

ES aid ine n'e tk 4 oo ke ew eee 8.4+1.4 10 
| EE Se ee ee 8.44+2.7 27 
ais id aia ae a and oe 9.7+2.3 39 
En Eee 13.8 + 1.8 28 
ci dn ok nor kn eee 12.6 + 3.3 17 
ELS cers clack 5 cine oewane oe 13.5 + 2.8 8 
cei eae ae cade kao 12.3 + 4.2 3 
Libs debe 6eneehanee ee 2 

134 











C. Sex Differences.—Table V allows for a comparison of the median 
age scores of boys and of girls. The boys show a decided superiority 
at each age level. There is no difference between the boy’s eight 


TaBLE V.—Sex DIFFERENCES 











Boys Girls 
Year Numb " 
AM AD umber o AM AD Number of 

cases cases 

8 10 +1.8 12 7.8+1.8 12 

9 10 +2.7 11 7.442.2 14 

10 11.1 + 2.6 115 9.3+2.2 13 

11 13.5 + 2.6 16 10.7 +1.9 9 

12 13.2+4 12 12.2+3.5 20 
66 68 
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and nine year scores, nor between their eleven and twelve year scores. 
Ferguson! finds ‘‘that the boys do as well as girls on the boards or 
slightly better.” 

D. Correlation with Linguistic Tests—The correlations of scores 
on the National Group Intelligence Test with Form Board scores by 
Pearson Product-moment Method is .33 + .06 (seventy-five cases) 
and with Pintner-Cunningham scores is +.09 — .13 (twenty-seven 
cases). 

E. Reliability—This was measured by correlation of first and 
second scores obtained in the manner indicated above. The corre- 
lations proved to be high, .90 +.02 (twenty-five cases). In cases 
where there was a disparity between the first and second scores, there 
was an increase in every case but one. This may be in part a practice 
effect and in part due to chronological age growth. 

Summary.—By the Ferguson method of scoring, the age incre- 
ments are very slight especially when interpreted in the light of 
deviations. It is apparent that there is a marked variability of the 
performances of children at each age level. There are significant 
differences in scores only between the extreme ends of the range chosen 
for investigation. Sex differences are present in favor of the boys. 
The test score is very reliable. The Ferguson series is statistically 
reliable but the method of scoring does not show satisfactory age 
increments. 


ALTERNATIVE METHODS OF SCORING 


Our next enquiry was to determine how far the lack of age incre- 
ments was due to the scoring rules adopted by Ferguson. It is 
impossible to discover from Ferguson’s article the plan on which he 
allotted scores to certain time intervals and it was considered possible 
that some other basis of scoring the time record might show clearly 
defined age-increments and thus meet our criterion of validity. Two 
experiments were made in this connection. 

The time in seconds taken by all subjects on each board were 
arranged in frequency distributions. The time range was then 
divided into five equal sigma intervals. The time records for the 
hundred and thirty-four cases were re-scored. Excepting for the 
eight year group there was a constant increase in score with each age 
level on each board and on the whole series combined. The differences 
in age norms were, however, slight and the standard deviations were 
large. 
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Another table of values was then obtained by dividing the time 
range into twenty equal standard deviation intervals. It was decided 
to consider Board 1 as a “‘shock-absorber”’ and to take the total score 
as the sum of the scores on Boards 2 to 3 inclusive. Table VI shows 
the distribution of time records for each board. 


TABLE VI.—DIsTRIBUTION OF TIME SCORES BY BOARDS 





Time-interval 


; Board 1 | Board 2 | Board 3 | Board 4 | Board 5 | Board 6 
in seconds 





0-9 
10-19 
20-29 
30-39 
40-49 
50-59 
60-69 
70-79 
80-89 
90-99 
100-109 
110-119 
120-129 
130-139 
140-149 
150-159 
160-169 
170-179 
180-189 
190-199 
200-209 
210-219 
220-229 
230-239 
240-249 
250-259 
260-269 
270-279 
280-289 
290-299 ‘“ “ 
300- _ 4. 30 
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An inspection of Table VI reveals a source of serious statistical 
error in the large number of undistributed scores. All persons who 
failed to complete the test are assigned a score of zero. Obviously 
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such a score is meaningless since it is given equally to those who would 
have finished in five seconds and those who would not have completed 
the board in unlimited time. In the second place the artificial fore- 
shortening of the distribution makes the apparent deviation much 
less than it really is. What is more unfortunate still, we cannot esti- 
mate the amount of deviation; all that we can say is that it is cer- 
tainly greater on Boards 4, 5 and 6 than here appears. The small 
number of cases of completions on these boards makes the median 
quite unreliable. 

In a recent publication! norms, based on a modification of the 
Ferguson Series by Plant, are given. It is unfortunate that we do 
not have the time distributions obtained by Plant and his associates, 
for even with the modified rules it seems extremely likely that the same 
sources of error were present in his data as in those here reported. 
Our sample was chosen too carefully to make any very significant 
change in the trend of time distributions. 

Our interpretation of these findings is that it is pure fiction to speak 
of norms under such circumstances. No amount of statistical finesse 
can compensate for the defects noted above. Had it not been for 
the publication of Plant’s norms we would not have gone further 
than this, but his table (presumably based on sigma or PE values) 
gives an apparent sanction to the scale that we do not believe to be 
warranted. In Table VII we give the median scores and average 
deviations for our time records when re-scored in sigma values. 


TaBLe VII.—MeEp1an Scores anpD AD ror AGE LEVBLS (ALTERNATIVE METHOD 
oF SCORING) 





























“| 2 3 4 5 6 Total 
of cases 
8 24 7. £3.7)2.943.1)2.242.8) 241.2] .7+ .15} 134 7.3 
9 25 6.3 + 3.8) 3.74 3.8) 3.7445) .08+ .3| .04+ .0513.8+ 9.7 
10 28 8.9 + 4.2) 5.2+4.0/ 6.34 5.7) 1.14+2.2| .7 + 1.8 |22.4+4 10.5 
11 25 10.7 + 3.5| 6.24 3.9) 7.04 4.9 | 3.0 + 4.4 | 2.2 + 3.8 29.7 + 14.7 
12 32 10. + 3.2) 7. +4.8/7.445.5 | 2.9+ 4.8 |3.0 + 4.8 |20.4 +4 17.3 
134 








By this method of scoring the eight and nine year groups still have 
practically identical scores. The age increments from nine to ten, 
and ten to eleven, on each board, and for the total series are very 
slight, when the medians are interpreted in the light of their deviations. 
There is no significant increase from eleven to twelve. The use of 














32 The Journal of Educational Psychology 


these larger units for scoring allows for a somewhat clearer definition 
of certain age differences but does not overcome the extreme vari- 
ability of scores at each age level. Even if the scores had been valid, 
the test series would not be a useful one for clinical purposes because 
of the extreme variability of the performance of children of the same 
chronological age. 


A QUALITATIVE ANALYSIS OF PERFORMANCE 


With a view to obtaining as complete a knowledge of the child’s 
performance as was possible, it was decided for the first ten or fifteen 
cases to observe and make notes concerning the attitude and type of 
performance of each subject. These comments were then classified 
as they appear in the Qualitative Analysis blank. Some of the items 
that appear in this blank have proven to be of little value. It may 
be important to comment on reasons for their rejection. 

One Hand or Both.—Some subjects used both hands throughout, 
while those who used only one hand, at any time, employed it only 
for the first and sometimes the second board. This was, no doubt, 
because of the simplicity of the tasks at this level for all found it 
necessary to use both hands for the remainder of the series. 

With or without Discussion.—Discussion rarely occurred. 

Trial and Error or Planning.—A decision on this point while easy 
in a few cases is usually purely arbitrary and subjective. Miss 
Schmidt’s rule! to count that performance as a trial-and-error per- 
formance in which there were more than a certain number of moves 
seemed to us to be a refinement that introduced too many sources 
of error on the part of the examiner, e.g., in defining a move, to warrant 
its adoption as a criterion. It would seem that some subjects were 
actually planning though they were directing all their efforts to the 
wrong block. The fault here would not be poor method but poor 
idea of form. A further difficulty is that in some cases both methods 
appear to be employed alternately on the same board. 

Satisfied with Blocks Not in Position.—This rarely occurred but 
when it did, the subject was reminded that the blocks fitted the holes 
exactly. There is no provision in Professor Ferguson’s directions for 
dealing with such an occurrence and it is uncertain whether the subject 
should be aided to this extent.. 

Lingering over Completed Holes.—This also occurred rarely and 
it was not considered worth while to make a summary of the data. 
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Attitude to Unfinished Board.—When five minutes had elapsed 
and the board was as yet incomplete the subject was told that we 
would not spend any more time on that board but try the next one. 
Usually a word of praise or encouragement was added so that the 
child would not feel his failure too keenly. In deference to the 
examiner’s wishes a child might appear quite passive, when his real 
desire was to complete the task set him. This involves such a larger 
possibility of error that scoring becomes not significant. In some 
cases a child would slip another block into place before the board 
was removed, in which case a judgment could be made with more 
confidence. 

The remaining qualitative data were summarized and appear in 
Table VIII. For purposes of comparison it was decided to take the 
five best time scores and the five poorest time scores at each age 
level and compare the results of these pupils on the qualitative scale. 
It is necessary to give the meaning attributed to these terms. 

Failure to invert blocks here means that the subject had the correct 
block at the hole but did not invert it before discarding it on the table. 
This was also scored as “not persistent.” 

Difficulty with angle was considered as a special difficulty, and 
refers to hole one on Board 2. Many subjects tried to place these 
two blocks, so that the line of division was at right angles to the 
board, instead of being diagonally placed. This fact was noted when 
the subject took over thirty seconds to complete it, or discarded 
the blocks temporarily. 

Persists in Placing Wrong Block in Hole.—This was indicated 
when the subject either left the wrong block in a hole, or repeatedly 
(more than once) tried to put the same wrong block in the same hole. 
The other items on the Blank are self-explanatory. 

An examination of Table VIII shows that at the eight year level 
there is practically no difference in the qualitative scores, the best 
pupils obtaining a total of forty-five and the poorest of forty-six. 
The nine, ten, eleven and twelve year age levels, however, show a 
wide difference as estimated by this qualitative analysis between 
the two types of subjects. Finally—repeating impossible moves, 
moving blocks correctly placed, failure to invert blocks, difficulty 
with angles, persisting with wrong blocks and not persisting with 
right block, occurred with greater frequency among those who obtained 
a low numerical score based on time, than those who obtained the 
highest time scores. We may then conclude that these factors are some 
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of the causes determining a low score on this series of form boards. 
They are the only ones investigated by us on which reliable judgments 
can be made. 


























TaBLeE VIII 
8 | 9 10 11 12 
Age Poor- Poor- Poor- Poor- Poor- 
Best Best Best Best Best 
est est est est est 
Repeating impossible moves | 7| 16 5| 19 | 2] 47 2| 18 3| 15 
Moving blocks correctly 
ss i a 0 1/0 2 1 3 4 1 
Failure to invert blocks..... 7| #7 2 6 | 2| 14 2) 7 5| 12 
Difficulty with angle........ 2| 3 Si #1 ais o| 2 2| 3 
Persists in placing wrong | 
block in hole............. 4| 7 6| 18 | 3] 15 1] 16 3| 16 
Does not persist with correct 
ea cad 12 | 12 7} 21 | 8| 21 4| 16 10| 21 
SS Sarr eT 45 | 46 20! 69 | 16 | 72 10| 62 27 | 68 




















In addition to the comments included on the Qualitative Analysis 
Blank a striking fact was apparent as regards the degree of interest 
displayed by children of different age levels in the form board tests. 
Children of eight and nine showed definite pleasure in the task and 
attacked each board as though they were playing an interesting game, 
but the children of eleven and twelve years displayed quite a different 
attitude. They were either plainly bored with the task or attacked 
it with calm and grim facial expressions, as though it were a difficult 
problem to be solved. 


SUMMARY 


1. The Ferguson Form Board Series was given to one hundred 
thirty-four pupils picked as a random sample of children aged eight 
to twelve years inclusive. Approximately equal numbers at each 
age level were tested by a single trained examiner. There were 
sixty-six boys and sixty-eight girls. The time records were transmuted 
into scores by the Ferguson Standards, and by an alternative method 
devised by the writers. Age, sex and grade norms were derived for 
the Ferguson Scores. A qualitative analysis of the performance was 
made by the examiner. 


2. Ferguson scores show no significant increments at successive 
chronological age levels. | 

3. Ferguson scores were shown by retests to be reliable (.90 + .02) 

4, At each chronological age level there are very marked individual 
differences in the time required to complete any board. 
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5. An alternative method of scoring shows age increments from 
nine to ten and ten to eleven, and fails to show any from eight to nine 
or eleven to twelve. The deviations are so large as to make the 
increments that do appear unreliable. 

6. The correlation of Ferguson scores and linguistic test scores 
is low. 

7. An analysis of certain forms of behavior during the examination 
can be made and scored objectively. These data when scored show 
significant differences between those who make high and low scores 
and probably are in part the reasons for the low scores. 


CONCLUSIONS 


1. The Ferguson Form Board Series tests some functions with a 
high degree of reliability, but these functions do not develop regularly 














Blank No. 1. 
Ferguson Form Board PUG 4 60k se ebaveds 
Qualitative Analysis of Performance 
Comments Number of board 
I. Method of attack: | 1 2| 3/] 4/1 5| 6 


(a) One hand or both 
(b) With or without discussion | 
II. Procedure: | 

(a) Repeating impossible moves (R) 

(6) Trial and error, or planning 
(T and E — P) 

(c) Moving blocks correctly placed (M) 

(d) Rate of movement—fast (f) (medium (m) 
slow (s) 

(e) Satisfied with blocks not in position (S) 

(f) Lingering over completed holes (L) 

(g) Fitted blocks together outside of hole (T) 

III. Special difficulties: 

(a) Note hole failure to insert (FI) 
Difficulty with angle (A) 

(b) Persists in attempt to put block in wrong 
hole (P) 

(c) Does not persist when has right block at 
hole (NP) 

IV. Attitude to unfinished board: 

(a) Willing to give in (W) 

(b) Shows desire to continue (C) 

(c) Note holes completed 
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with chronological age development. The variability of performance 
at each age level is so great as to render age-norms meaningless. The 
use of more refined methods of scoring does not alter this situation. 

2. The correlation with linguistic tests is too low to allow its use 
as an alternative to linguistic tests. It cannot be used with profit, 
therefore, in clinics primarily engaged in educational classification 
and diagnosis. 

3. The next step in the investigation of this series is to determine 
the results that would be obtained with practically unlimited time 
allotments. Only in this way can the actual variation in the per- 
formance of age groups be determined. 

4. Observations on performance during a test should be limited to 
those factors which are objective. 
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THE PERMANENCE OF LEARNING IN ELEMENTARY 
BOTANY 


PALMER O. JOHNSON 


University of Minnesota 


THE PROBLEM 


The study reported here was an attempt to determine (1) the 
extent of retention of the botanical information acquired by certain 
students in the course in General Botany 4-5-6 at the University; and 


(2) the relationship between the amount retained and the initial 
amount possessed. 


SOURCES AND METHOD 


There was given during the year 1926-1927 an objective final 
examination at the end of each quarter’s work in General Botany 
4-5-6. Accordingly, there were three final objective examinations 
which covered the content of the course. The aggregate of items 
comprising these tests was 587, of which 265 were of the true-false 
and 322 of the completion type. By means of this test one had a 
measure of what each student had done at the time of completing 
each quarter’s work. Then, by giving the same test at the beginning 
of the fall quarter of 1927-1928, a measure should be obtained of the 
retention for intervals of nine, six, and three months, which had 


elapsed since the student had completed General Botany 4, 5, and 6, 
respectively. 


One of the criteria of a good examination is its administrative feasi- - 


bility. Although, from the standpoint of securing greater validity 
it might have been desirable to have given the composite test of 587 
items, nevertheless, it was necessary to select from this composite, 
items in such numbers as to constitute a representative and compre- 
hensive examination, capable of being given in two hours. Other 
things being equal, that test is best in which the items range in regular 
intervals from very easy to very difficult ones. The procedure to 
secure this arrangement is by recording the frequency of error on each 
of the items of the test. In this way the relative difficulty of the 
respective items may be determined. For the selection of the items 
of the test in question, there were available final examinations for 
fifty-four students. By an analysis of these papers, 126 true-false 
37 
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statements and 172 completion items, or a total of 298 items, were 
selected and arranged in order of difficulty. Approximately an equal 
number of items were selected from General Botany 4, 5, and 6, 
respectively. The three portions of the test were also made as equal 
in difficulty as possible. 

In addition to administrative feasibility, validity and reliability 
are principal criteria of good examinations. The validity of the test 
used in this study was determined by correlating the scores attained 
on it by the examinees and their grades in General Botany 4-5-6. 
The basis for the determination of the students’ grades was secured by 
adding one-third of the grade given in laboratory work, one-third of 
the grade on quizzes (objective) and one-third of the score on the final 
examination (objective). The coefficient of correlation between these 
two measures was .87 + .02(N = 54). 

The reliability of the botany test as determined by ‘‘splitting”’ it 


into random halves and estimating by the Spearman-Brown formula 
was .93 + .01. 


THE EXTENT OF RETENTION OF BOTANICAL INFORMATION 


The botany test previously described was given during the first 
week of the Fall quarter, 1927. Two hours were devoted to the test 
and these general directions were given: ‘‘ The aim of this examination 
is to give you an opportunity to show what you have retained from 
your previous contact with botany. Read the directions for each 
part carefully and be sure you understand what you are to do before 
you begin.” In all, one hundred and twenty-eight students took the 
test. The scholastic records of these students were sought in the 
registrar’s office to find out when each had taken the course in ele- 
mentary botany and the grade which he had achieved. Of the total 
number, some had taken the course in elementary botany elsewhere, 
while others had not yet takenit. Asa result, three groups of students 
were obtained, each restricted in number. One group of forty-two 
students had taken elementary botany in 1926-1927; another, twenty- 
nine in number, in 1925-1926; and the third, twenty-two in number, 
had taken it in 1924-1925. There had elapsed, then, three, fifteen, 
and twenty-seven months, respectively, for these groups from the time 
the course in elementary botany had been completed until the botany 
test was taken. Only those students were considered who had taken 
the course at the University of Minnesota. 





Permanence of Learning Botany 


DMEDIATE RECALL AND DELAYED RECALL SCORES ON BOTANY TEST 4-5-6 





(24 Cases. TEST POR DELAYZD BBCALL GIVEN 3 MONTHS AFTER COMPLETION OF COURSE) 
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The chief concern in this discussion is with the group of students 
who had taken elementary botany in 1926-1927 for, as previously 
mentioned, there were available for these students the scores on that 
part of the test made when any particular quarter’s work had been 
completed. Of this group of forty-two students all had completed 
General Botany 4-5, while twenty-nine had completed General Botany 
4-5-6. 

The scores of twenty-four students! on the immediate recall and 
delayed recall tests in General Botany 4-5-6 are given in Table I. 
The respective scores are given for the true-false test (Botany 4-5-6), 
the completion test (Botany 4), the completion test (Botany 5), the 
completion test (Botany 6), and the entire test (Botany 4-5-6). The 
mean score, standard deviation and probable error of the mean are 
given for the immediate and delayed recall scores for each part of the 
test and for the test as a whole. The means of the total scores and 
their standard deviations for this group of twenty-four students, 
attained on the immediate recall and delayed recall tests were 207.4 + 
29.4 and 117.3 + 46.6, respectively. The loss in retention in per cent 
was 43.4. Figure 1 shows these facts graphically. 


TaBLE I.—Scores oF STUDENTS ON IMMEDIATE RECALL AND DELAYED RECALL 
Tests IN GENERAL Botany 4-5-6 (1926-1927) 








True-false test Conroe Caeeeten a ~ Total score 
Botany 4-5-6 Botany 4 Botany 5 Botany 6 Botany 4-5-6 
De- De- De- 
layed layed layed 


Imme-| De- | Imme-| recall | Imme-| recall | Imme-|/ recall | Imme-| De- 
diate | layed | diate score diate score diate score diate | layed 
recall | recall | recall | inter- | recall | inter- | recall | inter- | recall | recall 





score score score val score val score val score score 
nine six three 
months months months 
ae 75.8 45.9| 49.3 27.1) 41.4 22.6 40.9 21.8) 207.4 117.3 
Ns 6 asks wate +15.4) +21.5)+ 7.8 | + 9.814 5.5 |+ 9.6} + 6.7) 411.8)4+29.4 |+ 46.6 
Pies «nies + 2.1] + 2.9/4 1.07) + 1.3}4 .75)+ 1.32) + .9) + 1.6/4 4.04/+ 6.4 



































Table II gives the scores of thirty-six students on the immediate 
recall and delayed recall tests in General Botany 4-5. The means of 
the total scores and their standard deviations for this group of thirty- 
six students on the immediate and delayed recall tests were 141.8 + 





1 The scores of five of the group of twenty-nine students who had completed 
General Botany 4-5-6 were not incorporated in these data, since there was evidence 
that they had not exerted their best efforts in the latter part of the examination. 
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19.9 and 74.1 + 29.1, respectively. The loss in retention in per cent 
was 47.8. 


TaBLe II.—Scores or STuDENTS ON IMMEDIATE RECALL AND DELAYED RECALL 
Tests IN GENERAL Botany 4-5 











True-false test Completion test Completion test Total score 
Botany 4-5 Botany 4 Botany 5 Botany 4-5 
Delayed Delayed 
Immedi- | Delayed | Immedi- | 7°!) | tmmedi- —— Immedi- | Delayed 
ate recall} recall | ate recall wana ate recall Ph nd ate recall} recall 
score score score : score : score score 
nine six 
months months 
ree 52.7 29.6 48.1 24.6 40.9 19.9 141.8 74.1 
ae + 9.7 +13.01 | + 8.4 +10.4 + 5.2 + 9.4 + 19.9 | +29.1 
PEs... + 1.08 | + 1.46] + .94]) + 1.16] + .59 | + 1.04] + 2.25) + 3.27 





























Table III shows the honor point ratios, grades in botany and scores 
on the botany test of the group of twenty-two students who took 
General Botany 4-5-6 in 1924-1925 and the group of twenty-nine 
students who took it in 1925-1926. The median scores on the botany 
test for these two groups of students were forty-nine and fifty-three, 
respectively. There were.no records for these students on the immedi- 
ate recall test, so an accurate measurement of their retention was not 
possible. Some index of their botanical information at the time of 
completing the course is afforded by the grades received in General 
Botany 4-5-6. The coefficient of correlation (Rank-difference 
Method) between the scores attained on the botany test by the group 
of students who took botany in 1924-1925 and the corresponding 
grades attained in General Botany 4-5-6 was .75 + .06; between the 
same measures for the group of students who took botany in 1925-1926, 
.81 + .05. This would indicate that the students who had the most 
botanical information at the time of completing the course are very 
likely to retain the most after intervals of fifteen and twenty-seven 
months. 

In determining the achievement of the student in a course in 
elementary botany, it is obviously desirable to have some measure of 
_the amount of botanical knowledge which he possesses at the time of 
his entrance to the course. The botany test was given to one hundred 
and twenty-six students beginning General Botany 4-5-6 in the Fall 
quarter of 1927-1928. The median score for this group was 5.5. 








42 


The Journal of Educational Psychology 





As measured by this test, therefore, the botanical knowledge of 
students entering the course is apparently negligible. 


Tasie III.—Honor Pornt Ratios, GRADES IN BOTANY AND ScorEs ON BOTANY 
Test or Stupents WHo Took GENERAL Botany 4-5-6 1n 1924-1925 
AND 1925-1926 

















1924-1925 1925-1926 
Stu- Botany grade Se Botany grade Se 
dent | Honor ye ore _— y ger ore 
No. | point | Bota- bota- point bota- 
ratio | ny 4 Bota- | Bota- jny test) _.:, | Bota- | Bota- | Bota- jny test 
ny 5 | ny 6 | 45-6 ny 4 | ny 5 | ny 6 | 45-6 
1 2.839} A A B 138 | 2.724 A A A 192 
2 2.827; A A A 118 | 2.452) B A B 177 
3 2.5341 A A A 115 1.457) C B B 165 
4 1.216} C C C 91 2.709; A A A 127 
5 2.333) A A A 80 | 2.816) A A A 100 
6 1.753) B B B 71 2.204, B B B 97 
7 1.235} C C D 67 | 2.337) B B B 97 
8 2.186) B B B 65 .980| C C C 87 
9 1.585; B B C 61 1.140) B B C 73 
10 .865| C C D 53 1.453) B B C 65 
ll 1.200; C B C 52 1.274, B C C 58 
12 1.25 B B B 46 .981; B B C 56 
13 1.496; C D D 45 541) C D D 55 
14 .704, D D F 41 .983} C D D 53 
15 .325| C D D 38 1.000; C C C 53 
16 1.281} C D C 37 .953) C C C 48 
17 1.5 B D C 34 .788| C C C 46 
18 1.000; C C C 30 | 1.595) C C C 44 
19 .279| E D D 23 | 2.201} B B B 40 
20 1.087; C C C 21 1.268} B C C 38 
21 .778| D D D 19 .723| B C C 34 
22 .926; D D F 10 .735| C D D 34 
AY pers .415) D D D 34 
EA DE exeas .713| D C F 32 
a ree .487| D D F 29 
oS aa .949;} C D D 27 
a, a earns 1.300; C D D 25 
AS ra 1.275} D D D 21 
A, Lea Pa 1.087); C C D 13 
eee i eee 34 
Oo) ae i rae 53 
mm | vauka a re 87 
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As a basis for comparing the retention of botanical information of 
thefthree groups of students previously discussed, the data presented 
in Table IV were compiled. From an analysis of the mean botany 


» IMTERQUARTILE RLNCES AND MEDIAN SCORES O¥ BOTANY TEST 4-5-6 OF STUDENTS 
Ore AT VARIOUS INTERVALS APTER COMPLETING GENERAL BOTANY 4-5-6 
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grade, the mean honor point ratio and the mean percentile rank in 
intelligence it would appear that these three groups are fairly compara- 
ble. Within the limitations imposed by the number of cases and the 
validity of the technique employed in securing the data, a tentative 
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generalization might be made that the median freshman entering the 
course in General Botany 4-5-6 attains a score of 5.5 on the botany test 
given; this score is raised to 205 during the progression through the 
course; then three, fifteen, and twenty-seven months after completion 
of the course the median scores are represented by the measures 110.5, 
53, and 49, respectively. These data are presented graphically in 
Figure 2. The percentage of loss in retention represented by the drop 
of the median score from 205 to 110.5 is 46.1; from 205 to 53, 74.1; 
and from 205 to 49, 76.1 (assuming that the groups are directly com- 
parable). The rate of loss would, therefore, appear to be quite rapid 
during the three and fifteen months after completing the course 
followed by a gradual decline afterwards. 


TaBLE IV.—ComMPARISON OF SCORES ON Borany Test OF STUDENTS AT VARIOUS 
INTERVALS AFTER COMPLETION OF GENERAL Borany 4-5-6 











General botany 4-5-6 Mean? 
Num- , |Mean? 1 
“we Mean "~nanl percenti e 
Months Q: | Md | Q; | botany . rank in- 
of point ; 
Year taken | after com- grade : telligence 
. cases ratio 
pletion tests 
1927-1928 | Beginning 
botany 126 F ee fF ae. eee 48 .2 
1926-1927 0 24 |188.0/205.0/224.5) 2.65 | 1.264 58 .2 
1926-1927 3 24 | 91.0)110.5/133.0) 2.65 | 1.264 58.2 
1925-1926 15 29 | 34.0) 53.0) 87.0) 2.8 | 1.318 50.0 
1924-1925 27 22 | 34.0) 49.0) 71.0) 2.83 | 1.418 57.6 





























1Grade A = 1;B = 2;C =3;D =4;E =5;F = 6. 
2A = 3 honor points; B = 2;C = 1;D =0;F = —1. 
3’ Based on Minnesota collegibility tests. 


THE RELATIONSHIP BETWEEN THE AMOUNT OF BOTANICAL INFORMA- 
TION RETAINED AND THE INITIAL AMOUNT POSSESSED 


The coefficients of correlation presented indicate a considerable 
interdependence between the initial amount of botanical information 
possessed by the student at the time of completing the course in 
General Botany 4-5-6 and the amount retained at the end of certain 
intervals of time. These coefficients do not, however, answer the 
question whether the individual who had the most botanical informa- 
tion at the time of completing the course in elementary botany retains 





Permanence of Learning Botany 45 


relatively the most after certain intervals of time. A formula 
developed by Harris! for dealing with certain biological data may assist 
in answering this question. The formula is: 





‘2 
V.\? 
V1 i re. + (r= re 7‘) 
From the data given in Table II we have the following values: 
x = score on the immediate recall test; y = score on the delayed recall 
tests; and z = the deviation of the y of any individual from its most 


probable value. The problem is to determine whether the relative 
value of y changes from the lower to the higher grades of z. 


M, = 141.8 M, = 74.1 
SD, = 19.9 SD, = 29.1 
~  100SD _ _ 100SD _ 
V = —a7— = 14.034 Vy = ap = 39.27 
Tey = .754 + .048 ie = .3574 
7] 


Tz = .O17 + .08 


From the high relation denoted by the value, rzy = .754 + .048, 
and the marked or substantial relation denoted by the value, r., = 
.517 + .08, it appears that as far as these thirty-six individuals in 
question are concerned, those students who have the most botanical 
information as measured by the botany test at the time of completing 
Botany 4-5 are likely to retain not only absolutely more but relatively 
more of this information, after an interval of six months. 

Similarly, applying the formula to the data given in Table I, we 
have: 


M, = 207.4 M, = 117.3 

SD, = 29.4 SD, = 46.6 

V, = 14.17 V, = 39.73 

Tey = 84 + .04 i 
_ pg 


Tzz = .665 + .07 





on-= 


1 Harris, J. Arthur: The Correlation between a Variable and the Deviation of a 
Dependent Variable from Its Probable Value. Biometrika, Vol. VI, pp. 438-443. 
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It appears, then, that the same conclusion drawn concerning the 
preceding group holds true for this group as well. That is, from the 
high relation denoted by the value, rz, = .84 + .04, and the marked 
or substantial relation denoted by the value, r., = .665 + .07, it is 
apparent that for this group of twenty-four students, those students 
who have the most botanical information as measured by the botany 
test at the time of completing General Botany 4-5-6, are likely to 
retain not only absolutely but relatively more of this information after 
an interval of three months. 


SUMMARY 


1. The instrument used for measuring retention was a botany test developed 
from previous final objective examinations given in General Botany 4-5-6. The 
validity of this test based upon instructors’ grades in General Botany 4-5-6 as a 
criterion was found to be .87 + .02. The reliability as determined by the “‘split”’ 
test method and estimated by the Spearman-Brown formula was .93 + .01. 

2. The median score on the botany test of one hundred twenty-six students 
upon entering the course in General Botany 4-5-6 was 5.5. 

3. The median score on the botany test of twenty-four students as determined 
by the results from the components of the test given at the time of completing each 
quarter’s work in General Botany 4-5-6 was 205; three months after completing 
the course, the median score for this group was 110.5. 

4. The median score on the botany test for a group of twenty-nine students 
fifteen months after completing General Botany 4-5-6 was 53; for a group of 
twenty-two students twenty-seven months after completion of the course, 49. 

5. The mean score on the immediate recall test in General Botany 4-5-6 for a 
group of twenty-four students was 207.4; the mean score on the delayed recall test 
(given three months after completing the course) was 117.3. The loss in retention 
was 43.4 per cent. 

6. The mean score on the immediate recall test in General Botany 4-5 for a 
group of thirty-six students was 141.8; the mean score on the delayed recall test 
given six months after completing General Botany 4-5 was 74.1. The loss in 
retention was 47.8 per cent. 

7. The students who have the most botanical information (as measured by the 
botany test) at the time of completing General Botany 4-5 or General Botany 4-5-6 
are likely to retain not only absolutely but relatively more of this information after 
the lapse of six or three months in time. 


IMPLICATIONS OF THE FINDINGS 


Certain types of preparation have perhaps always been mandatory 
for admission into any unit of our educational organization. As 
various curricula have been developed within any given unit, there 
have been set up pre-requisite courses the completion of which is 
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necessary before certain sequent courses may be pursued. The 
implication has been that the student must possess certain information 
or other outcomes of training before he is competent or equipped to 
pursue advanced courses in a particular field. It may be that certain 
courses arbitrarily determined as pre-requisites for subsequent courses 
at some time in the past have maintained their status by virtue of this 
early position rather than because of any propedeutic values which 
they may now possess. 

Although the securing of data concerning the permanence of 
learning is undoubtedly pertinent in the study of effective factors in 
many educational procedures, the writer has found it especially 
expedient in studies made to evaluate certain pre-requisite courses at 
the University of Minnesota. 


f " j 
i 
| 
| 


; 














RELIABILITY OF REPEATED GRADING OF ESSAY 
TYPE EXAMINATIONS 


WALTER CROSBY EELLS 


Stanford University 


The wide variability shown by different teachers in grading the 
same examinations of the essay type is well known. The classic 
experiments of Starch and Elliott on reliability of grading history, 
English, and geometry papers are well known and have been repeated 
in various forms.' But there seems to have been little investigation 
of the reliability of regrading of the same material by the same teachers. 
Starch reports a brief investigation in which seven college instructors 
were asked to regrade a set of ten of their own papers after intervals 
of two weeks to four years. In this case each question was regraded 
only once.? 

The object of this paper is to report the results of an experiment 
in regrading the same set of material after an interval of eleven weeks 
by sixty-one different teachers, members of the author’s summer 
quarter class in tests and measurements. Practically all were exper- 
ienced teachers. During the first week of the course they were given 
mimeographed sheets containing answers from grammer school geog- 
raphy and history papers, with instructions for grading them. The 
graded papers were handed in the next day. The geography questions 
were taken from Ruch’s experiment ;* those in history from an exper- 
iment reported by Paulu.‘ The questions used and instructions given 
were as follows: 


GEOGRAPHA 


Name and locate five of the largest cities of the United States and name their leading 
industries, exports, and imports. 
ANSWER ONE 


Five of the largest cities in the United States is Detroite. An export is Cars. 
And industry is Manufactoring. Chicago is an important city and an export is 


1 Starch, D., and Elliott, E. C.: Reliability of Grading High School Work in 
English. School Review, Vol. XX, 1912, 442-447; in mathematics, ibid., Vol. X XI, 
1913, pp. 254-259; in history, zbid., Vol. XXI, 1913, pp. 676-681. Ruch, G. 
M.: ‘‘Improvement of the Written Examination,” Chicago, 1924 (Chap. III). 

2Starch, D.: Reliability and Distribution of Grades. Science, n.s. Vol. 
XXXVIII, 1913, pp. 630-636. 

3 Ruch, G. M.: ‘‘Improvement of the Written Examination.”’ Chicago, 1924. 

‘Paulu, E. M.: “Diagnostic Teaching and Remedial Teaching.” Boston, 
1924, p. 5. 
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Manufactored and canned goods. An industry of Chicago is meate packing. 
New York is another important city. An industry of N. Y. is manufactoring. 
An export of N. Y. is manufactured goods. Pittsburge is an important city of 
U. S. An export is iron core. An industry of Pittsburge is manufactoring. 
Another important city of U. S. is New Orleans. An export of New Orleans is 
cotton. An industry of New Orleans is manufactoring. 


ANSWER Two 


The five largest cities of the United States are (1) navada (2) Arkansas. The 
leading industry of navade is manufacture and the leading industry of Arkansas is 
agriculture. The leading imports are manufacturing mostly. 


ANSWER THREE 


The 5 largest cities of United States are New York, Chicago, St. Louis, Boston, 
San Francisco. New York is in the State of New York. Chicago is in the state 
of Illinois. St. Louis is in the State of Missourri. Bostin is in the state of Mass. 
San Francisco is in the state of California. New York is a manufacturing city. 
Chicago is noted for meat packing center. St. Louis is noted for manufacturing 
textile goods and iron goods. San Francisco is noted for the packing of fruit. 
New York exports iron goods and imports wool, cotton, and other raw materials. 
Chicago exports meat and hides and grain and imports foods and grains. St. Louis 
exports manufactured products and imports raw materials. 


Grade each of the three answers above on a scale of 0 to 20, according to your 
best judgment of its merit, 20 being an answer ordinarily accepted by teachers as 
entirely satisfactory, and 0 being an answer practically without discernible merit. 


UnitTep States History 


1. Explain the Monroe Doctrine. What Caused the President to make such a 
declaration? 


The Monroe Doctrine was a declaration made by President Monroe in his 
Message to Congress in 1823, in which he declared: 
(a) That the United States would not allow any European power to plant any 
new colonies on the American continent. 
(b) That we resolve to let the affairs of the old world alone. 


(c) That we determined that they in return should not meddle with the 
affairs of our country. 


It means that we consider ‘‘ America is for Americans.”’ 

Several South American countries and Mexico had declared themselves repub- 
lics, independent of Spain. This greatly displeased some of the European kings 
and the Czar of Russia as they wanted to gain more territory in America, so they 
formed an alliance to force these countries again under the rule of Spain. This 
was a thing that did not appeal to us so President Monroe made this declaration. 

2. What were the causes of the Mexican War? The Civil War? Spanish- 
American War? 

The causes of the Mexican War were the dispute over the boundary line, the 
invasion of Texas by Santa Anna, and the death of Crockett at the Alamo. 


Civil War was caused by the Southern States holding slaves which was against 
the law of humanity. 
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The Spanish-American was caused by Spain’s brutal treatment of the Cubans 
and the sinking of the battleship Maine which was blamed to Spain, and several 
other causes. Spain was beaten everywhere in a short time. The United States 
acquired the Philippines and freed Cuba. 

GRADE THE ABOVE TWO ANSWERS, EACH ON A SCALE OF TEN. 


At the close of the summer quarter, in connection with the final 
examination, the members of the same class were asked to grade the 
same questions again; also to state whether they were consciously influ- 
enced by any memory of the grades given eleven weeks earlier. 
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Scatter diagram showing first and second gradings of a question in geography by same 
teachers. (N =61). 


It will be instructive to exhibit the results in detail for one question, 
taking for this purpose the first one in geography. This may best 
be done by means of the scatter diagram, first gradings of the questions 
being shown horizontally, second gradings vertically. 

This figure shows an astonishing lack of agreement in judgment 
of the same material by the same teachers. Only ten of the sixty-one 
teachers gave the same grade to this question the second time that 
they assigned the first time, as shown by the entries in the principal 
diagonal. Sixteen teachers marked it above average of the group 
the first time, but below average the second time. Eleven others 
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marked it below average the first time, but above average the second 
time. One teacher changed her estimate of the value of the answer 
from five to fifteen. There were thirteen teachers who agreed on a 
grade of ten upon first grading of the paper. Could better evidence 
be asked of its true value than this agreement of thirteen judges, 
marking independently? Unfortunately, however, when the same 
thirteen judges expressed their judgment on the same material eleven 
weeks later they varied in their estimates from five to fifteen! The 
number of points change in judgment between first and second grad- 
ings for all five questions is summarized in the following table. The 


geography questions were graded on a scale of twenty, the history ones 
on a scale of ten. 


TABLE I.—NuUMBER OF POINTS CHANGE BETWEEN FIRST AND SECOND! GRADING 
oF SAME MATERIAL BY SAME TEACHERS (N = 61) 





Number of teachers making change 
. | 
Number of points, ee 














change Geography (Geography \Geography | History History 
| I | II | lil | I II 
a ee = -— =A $$$ 
0 10 55 14 | 21 | 17 
l 13 4 6 22 23 
2 7 0 12 12 10 
3 6 1 7 3 ) 
4 10 | 0 10 3 1 
5 7 1 4 | 0 1 
6 2 2 
7 2 3 
8 1 1 
i) 2 1 
10 l ] 





The only question for which even half the teachers could duplicate 
their mark on second grading was in the second one in geography, 
where fifty-five agreed both times that it was valueless. Omitting 


this second question in geography, the situation with reference to the 
other four may be summarized in the statements: 


No teachers duplicated their marks on all four questions 

2 teachers duplicated their marks on only three questions 
17 teachers duplicated their marks on only two questions 
23 teachers duplicated their marks on only one question 
19 teachers duplicated their marks on none of the questions 
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The lack of reliability may be compactly expressed in terms of 
reliability coefficients, computed from the scatter diagrams, by the 
ordinary Pearson Product-moment Method. With their probable 
errors, they are as follows: 


Ne i os es sl wean ated eka ad bee 8 0.25 + 0.08 
od wag 60a 00 0.00.040000 OS 46d0 COR 0.51 + 0.06 
EE sceucuchesdaberpececuabiwed 0.31 + 0.08 
Ce tec kick hicks de eadeecsdinasaaed aeons 0.39 + 0.07 


It is unnecessary to state that reliability coefficients as low as 
these are little better than sheer guesses. Only a few of the teachers 
thought they were influenced by first grading. As far as they were, 
however, it would tend to make the correlations reported above 
spuriously high. The fallibility of human judgment, even when it is 
the same human judging the same material, is strikingly demonstrated. 


SUMMARY 


Repeated grading of the same essay type of material by the same 
teachers (sixty-one) after an interval of eleven weeks is very unreliable. 
Reliability coefficients vary from 0.25 to 0.51. Variability of human 
judgment in the same individual is about the same as variability 
between different individuals. 





COMPUTING STATISTICAL COEFFICIENTS FROM 
PUNCHED CARDS 


R. M. MENDENHALL AND RICHARD WARREN 


Columbia University 


A new method for securing product moment correlations, means, 
standard deviations, distributions, and data for the calculating of 
means of arrays and of higher moments by means of standard Hollerith 
tabulating equipment was discovered by the writers while working on 
the Pennsylvania Study data of the Carnegie Foundation for the 
Advancement of Teaching. 

This method has been described in an elementary form in a mono- 
graph published by the Columbia University Statistical Bureau, 
entitled “The Mendenhall-Warren-Hollerith Correlation Method.” 
The purpose of this article is to give the mathematical aspects of this 
correlation method, and to indicate the extensions necessary in calcu- 
lating higher moments and product moments. 

The calculation of the correlation coefficient involves a function of 
the sums of two variables, the sums of the squares, and the sums of 
their cross products. The usual method of obtaining the products 
has involved actual multiplication of the deviations as measured from 
an arbitrary origin. The same results may be obtained by taking the 
sums of progressive totals which are easily yielded by mechanical 
tabulating equipment. 

The method of summing for the purpose of obtaining sums of 
powers of a series of numbers is given in West’s “Introduction to 
Mathematical Statistics” and also in Whittaker and Robinson’s 
“Calculus of Observations.’”’ The same method can be extended to 


give sums of the cross products of two series of numbers. The follow- 
ing notation will be used: 


X, Y = Mean values of the X and Y variables, ete. 
oz, 0, = Standard deviations of X and Y. 
X, = Score of n in trait X;7.e., X; = Score of 5 in trait X. 
Y., = The Y score of an individual having a score in trait X equal 
to n, i.e., Yz, = Y score of an individual whose X score is 3. 
=X = Sum of all scores in trait X. 
SY.» = Sum of the Y scores for individuals whose X scores are equal 


to n, i.e., SY,, = the sum of the Y scores for individuals who 
made a score of 5 in trait X. 
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Suppose for example that cards on which are punched values of 
several variables, such as X, Y, Z, etc., are sorted in order by the X 
variable. Let X, be the highest value in this trait, X,_, the next 
highest, etc., on down to 3, 2,1. Let Y., be a value of Y for a score 
of n in the X trait, Y.,_, the Y score of an individual who makes a 
score of n — 1 in the X trait, Y., the Y score of an individual 
who makes a score of 3 in trait X, etc. The cards, after ordering from 
highest values of X down, are run through a printing tabulator wired 
to control and indicate X scores in counter 1, give a card count or 
frequency count per interval in counter 2, a cumulative sum of the X 
variable in counter 3, a cumulative sum of the Y variable in counter 4, 
a cumulative sum of the Z variable in counter 5, etc., until the capacity 
of the machine is reached.! 

The Hollerith machine is automatically controlled in such a way 
that as the cards run through, the values are progressively totaled until 
there is a change in the value of the X variable, at which point the 
tabulator stops feeding cards, prints the number of cases in that 
interval and the cumulative sums of the X, Y, Z, etc., variables to 
that point, and starts again, indicating the first card of the next group, 
repeating the above process until all the cards have been tabulated. 
The tabulation is very fast, about 100 to 150 cards being tabulated 
per minute. 


Table A shows symbolically the operations which the machine 
performs in the first four counters only. 

1. It will be noticed that the last row in Column III gives SX, + 
SXn-1 + SXn-2 +--+ + .8(8) + S(2) + S(1) which is equal to 
=X. Since zero is taken as an arbitrary origin the mean for X, 


X = 2X/N. 

2. If column III is added SX, is added in n times, SXn_; is added 
n — 1 times, etc., on down to S(3) which is added three times, S(2) 
twice, and S(1) once. The sum of the progressive totals is represented 
by nSX, + (nm — 1)SXa-1 + (mn — 2)SXn-2 + °° > + 38(3) + 28 
(2) + S(1) which is equal to the sum of the squares of the X variable, 





1 For a complete description of the correlation method, including the set-up 
of the data, wiring diagrams for the printing tabulator, sorting directions, sample 
tables, computation form, see The Columbia University Statistical Bureau, Docu- 
ment No. 1, Sept., 1929, The Mendenhall-Warren-Hollerith Correlation Method. 
Published by the Columbia University Statistical Bureau. 








wae preva shes eee {TRS Ache tL Fe ere — — 


ot 


55 











Axz = “49 + “*a9z + “*ase sXZ = (1)S + (2)82 

% + + "ag(@ — u) + “agit — u) + “*agul+ (e)se+ °° * + *XS(I — “) + *xSU N uUINJoo Jo uINg 

S “ast+*ast+ “ast 6 °° +" “as+ "ag + “tas |(Ds+(@st+(es + °° * + "xs + “xs (1)/ I 

‘S ‘sagt+*sag+-.- + “4g +"“49 + “cag (2s + (s+ °° * + "xs + “xe (zs z 

& *sag+ +: 7+ ex + “tag + “sag ()s+ °*'* + "xs + “xs (e)s g 

° 

O 

Ss was + “x9 + “ag “ys + "xg + “XS @-*x)f wey 

8 r’ag9 + *ag I-*xg + “xs ("xf “7 

3 “*A8 “xs (“xf *y 

D 

= A Jo [8903 OArseoiZ01g9 X jo [8703 OAtssoiZ017 Aouonbeiy UOIPBVOIPU! B100G 

3 y g z I 

Q, 

= 

= 19,un0D r93zun0D 19quNnoD 194,un0D 
AI III II I 














VY qiavy 






ae 


——————— OO 


7_/~ 


| 
: 








56 The Journal of Educational Psychology 


or >X?. From the sum of the squares, and the mean, the standard 
deviation is obtained in the usual way from the formula: 
SX? aa 


2 
CO: = —— -—- xX 


N 


If the X scores do not extend down to a score of one a correction 
is necessary in order to shift the arbitrary origin to zero. Column III 
is added down to, but not including, the last progressive total. The 
last progressive total is multiplied by the score indication at the left 
and added to the sum of Column III to that point. 

3. The last row in Column IV gives the sum of the Y variable and 
is represented by SY., + SYz,_,+ °°: +SY¥.i; + SY2, + SY2. + 
SY,,or ZY. The mean of Y is given by 2Y/N. 

4. Summing Column IV, SY,, is added in n times, SY.,_, is added 
in (n — 1) times, SY,, three times, SY,, twice, etc. The sum then 
is nSY,, + (n — 1)SY.2,_, + ---: + 3SY., + 2SY., + SY:, = 
~XY. The same correction to the zero point is made as in the previ- 
ous case 7.e., if the series of scores in X stop at a number K the cross 
product is obtained by adding Column IV to, but not including, the 
row designated by K and adding on K times LY. 

5. The next operation is to sort the cards on the Y variable and 
obtain a distribution for Y and progressive totals of Y from which the 
mean and standard deviation for this variable are obtained. 


The correlation coefficient then may be calculated according to the 
formula 





a — XY 


oy 


A sample problem involving only ten cases is given in Table I to 
illustrate the simplicity of the method. 

Column I gives the score indication. A score of 6 was lacking in 
text X and a blank card was inserted. (If this card had not been 
inserted, it would be necessary in adding the columns to add the 7 
twice in the X column and the 6 twice in the Y column.) 

Column II gives the frequency of each score. 

Column III gives the progressive totals of the scores in test X. 

Column IIIa would not appear in a tabulation. It is inserted 
only to show the construction of the numbers in Column III. 


Column IV gives the progressive totals of the Y scores when 
arranged in order by the X variable. 





T= 
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Column IVa would not appear in a tabulation. It is included 
merely to show the construction of the numbers in Column IV. 


TaBLeE I.—Continued 
Second Tabulation, Y Distribution 








I II III 
Y f{(Y) Progressive total of Y 
6 2 12 
5 1 17 
4 3 29 
3 2 35 
; aac 35 
1 1 36 
0 1 36 
Total 10 164 = SY? 
The last 36 should not be included in the sum of Column III. 
- *S — 38 x 36 
oy = A/—~ — 3.6? ‘sy = = .79 
sd 10 - 1.46 X 1.84 
oy, = 1.84 


Diagram of Procedure.—The following diagram summarizes 
graphically the steps in the six-variable job which was used to illustrate 
the procedure of computing intercorrelations by the M-W-Hollerith 
method. 

For curve fitting, either of frequency curves or non-linear regression 
lines where higher moments or product moments are necessary, the 
summation method when applied to tabulating machines reduces 
considerably the amount of labor involved. At the present time hand 
additions of the progressive totals are necessary. A new machine is 
being built which will give automatically the sums of cumulative sums 
up to the ninth order. From these sums all the moments up to the 
eighth order can be computed if desired. This machine can also be 
used as a difference tabulator and will be extremely useful in con- 
structing tables of various functions. 

In fitting a second degree parabola of the type y = az? + br +c 
the values 2X?Y, XY, TY, UX, TX*, TX*, YX*4, and N are necessary 
for a solution of the normal equations involving a, 6, and c, as 
unknowns. By the use of the standard tabulator these sums can be 
secured if additional accumulations are made in Column III and 
Column IV of Table A. A card is punched for each score indication 
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D1aGRAM OF M-W-H ProcepvureE FoR ComMpuTING INTERCORRELATIONS 











Original Data Sheets Data Punched on Cards by 
with Serial Numbers amd Key-punch Operator 
and Check-sums 150 to 200 Cards per Hour 

















Punched Cards 
and Data Sheets 


| 








Sorter and Tabulator Punched Cards Verified by 

Sort 350 Cards Operator Using Mechanical 
. Verified , 

per Minute ae’ Verifier 

Tabulate 150 Cards 150 to 200 Cards per Hour 

per Minute » 
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Tables of Cumulative Totals 




















Adding Machine Operator | Second Adding Machine Operator 
Adds Cumulative Total Columns "1 Checks Addition of Columns 

Or Punch Operator Punches | 

Cumulative Totals on Cards and - SF 

Adds Them on a Tabulator Tables with 








Verified Sums of Columns 


Monroe Calculator Operator Using 
M-W-H Form for Computing Inter- 
Stenographer M-W-H-Form correlations Finds Means, Sigmas, and 
Types Report |——with Finished—-<— Correlations Checking to balance with 
to Investigator Computations Check-sum 





























of X showing the accumulation of SX and SY to that point. Thus 
on the first card the score value of X, is punched and also the numerical 
equivalent of SX, and SY,,. On the second card the X,_: score, 
SX, + SX,-1, and SY,+ SY, ,. On the third card the score 


X,-2, SXn-+ SXn-1, + SXn-2, and SY, + SY.z, , + SY:,_, and so 


7n—1 


on for the other values of X down to 1. There will be as many cards 
as there are intervals of the variable X. If these cards are run through 
the tabulator to a cumulative total the following table will result, 7.e., 
accumulating Column III and Column IV of Table A we have: 
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The last row of Column II, Table B, gives =X? and the last row of 
: DX* + LX? 

Column III, 2XY. The sum of Column II gives 9 from 
which >X* can be obtained by substituting for >X?. YX?Y can be 
obtained in a similar manner. 

If cards are punched containing the values entered in Table B and 
these values accumulated, Table C will result. | 

The last row of Table C gives the total of Columns II and III of 
Table B. The only hand addition necessary is the addition of Columns 
II and III of Table C. Even this operation can be done by punching 
cards containing the entries in this table and running them to a total. 

Since the values for >X, TY, 2X*, TX*, EX Y, TX?*Y, have been 
obtained, the values of >X*, and = X*Y, can be calculated very easily. 
The normal equations are set up in the ordinary way. A discussion 
of these equations is not here attempted as it can be found in several 
sources. Also the algebraic manipulation necessary to transfer the 
moments from zero to the mean is given by several writers. 

No numerical illustration of the method for securing higher 
moments will be given at the present time. 





SUMMARY 


Standard tabulating equipment can be used to great advantage in 
the following instances. 

1. Questionnaire analysis where counts of various types of responses 
are desired and particularly where counts of items by specified cate- 
gories are wanted. 

2. Item analysis of test questions where calculation of biserial 
‘‘’? ig desired, or the mean scores of individuals who make various 
responses to single questions. 

3. Calculation of correlations involving several variables. 

4. Calculation of correlations based upon large populations. 

5. Calculation of statistical coefficients from populations which 
have been cross classified. 


6. Where speed and one hundred per cent accuracy are factors. 





THE PROBABLE ERROR OF A DIFFERENCE FORMULA 
KARL J. HOLZINGER 
University of Chicago 


In the October number of this Journal there appears a new probable 
error formula by Pratt, Dunlap and Cureton. The formula given 
is grossly in error owing to apparent unfamiliarity with algebra. 
On page 498 of the above article, the product o,02 is factored out of 
the summation as if it were a constant. Such simplification is inad- 
missable because a; and a2 here represent variables under fluctuations 
of sampling. 


The correct formula is very easy to obtain.! Using the notation 
of the above article, the probable error of 


tm fy— fy a Mw MA 


02 Oj 
is required. The Z’s are taken as constants. 
Differentiating this expression, we have. 
ay = 720M a — (Ma = Zs)doy _ ond My — (Mi — 2:)dos 


oo” o;? 








Squaring both members of this equation, and using square brackets 


to denote the sum for all samples divided by the number of samples 
we have 


[d%] = oi%d2Mi) + (Mi — Z1)%d%e1] — 2e01(Mi — Z:)[d Mido) 











oi4 
4 221d? M3) + (M2 — Z1)%d%e2} — 202(M2 — Z2){dM2d2} 
o2! 
_ 2{e:e2[dMidM 2} + (Mi — 2Z1)(M12 —Z:)[doido2) — o1(M2—Z:)[dMides} — o2(Mi—Z:)[dMede)} 
1703? 





This last expression may be simplified by the usual substitutions 
with Pearson’sformulas.2. We have here assumed normal distributions 
so that 8, = 0, 


2 
[dM ,dM;] = ae [do ,dox] = oN [do,dM;] = 0, 


and 
[do,dM,.] = 0. 


2 This formula was derived by the writer in a paper prepared for the Inter- 
national Congress of Psychology at Yale. 


? Pearson, Karl: On the Probable Errors of Frequency Constants. Biometrika, 
Vol. IX, p. 1. 
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Making these substitutions, we then have 


_— a *0 =n Ie +he= 2ruthihe| (1) 


The probable error is, of course, 


PE, = 6745 {= — ta i el = mani it 





ON (2) 

The above formula was derived by the writer for just such problems 
as those described by Pratt, Dunlap and Cureton. The need for 
such a formula was first suggested by an article by Gates! in which 
quantities of the form M,/o, were compared incorrectly. Gates 
gives some probable errors, but they are also incorrect. 





1Gates, Arthur I.: A Critique of Methods for Estimating and Measuring 
Transfer of Training. Journal of Educational Psychology, December, 1924. 





SHORT METHOD FOR FINDING ZERO ORDER 
COEFFICIENTS OF CORRELATIONS! 


J. F. WALKER 
University of Arizona 


The usual method of finding the coefficient of correlation requires 
the preliminary work of tabulation for a correlation table and from 
this table the means, sigmas and coefficient of correlation are then 
worked out. 

In case the total number of frequencies is small, resort may be made 
to the use of the crude score method, but that method is not satis- 
factory unless computing machines or computing tables are easily 
available. 

Because of the rather large amount of labor required in finding the 
coefficient of correlation by either of these methods, a simpler technique 
is desirable and the following is offered as a promising modification of 
methods now in use. 

In working partial correlations, the most tedious and probably 
the most inaccurate work is done in making the tabulations for the 
correlation tables which are required in finding the zero order coeffi- 
cients of correlation. The advantages of the proposed method are 
particularly observable in the solution of such a problem as would 
be met in working out partial correlations where several variables 
had been measured. For this reason a problem having four variables 
and six zero order coefficients has been chosen for solution. 

The method is shorter than either the correlation table or crude 
score technique even when only two sets of variables are used, but a 
study of the solution will show that the greater the number of sets of 
variables used, the greater is the saving of time. 

The scores V, X, Y, and Z represent scores in intelligence, arith- 
metic, computation, arithmetic reasoning and comprehension in 
reading, respectively while v’, x’, y’, and z’ represent deviations of 
scores from the guessed means. 

The formula used for finding the correction is 


Sir’ 
e = FE 








1 It represents a collaboration of Mr. George W. Plumleigh of Espanola, New 
Mexico and the writer, and results from suggestions made by the former while a 
student in the summer session of the University of Arizona. 
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V v’ x 2’ Y y’ Z z va’ | v’y’ | v2’ | a’y’ | 22’ | yy’? 
84 21 26 1] 55 2| 44 2 2 4 4 2 2 4 
78 1} 20 | —1] 50 Oo| 41 1] —1 0 1 o| —1 0 
57 —-3i 18 | -2] 40 | —3] 32 | —2 6 9 6 6 4 6 
55 —-3/ 20 | -1| 42 | —2| 37 0 3 6 0 2 0 0 
92 4{/ 30 3/1 60 4| 46 3} 12] 16] 12 12 9/ 12 
62 -3}] 18 | —1/ 80 O0| 36 0 2 0 0 0 0 rr) 
86 3] 24 1| 54 2| 43 2 3 6 6 2 2 4 
68 -1} 10 | -1] 4 o| 35 | —1 1 0 1 0 1 0 
73 o| 20 | -1] 45 | —1] 38 0 0 0 0 1 0 0 
85 3|/ 26 1| 55 2| 42 2 3 6 6 2 2 4 
68 —-1/ 19 | —1| 48 0] 35 | —1 1 0 1 0 1 0 
73 o| 20 | -1] 45 | —1/ 38 0 0 0 0 1 0 0 
85 3] 26 1] 55 2{ 42 2 3 6 6 2 2 4 
50 —4/ 20 | —-1| 44 | —2/] 27 | —3 4 s| 12 2 3 6 
82 2! 19 | —1]| 50 o{/ 40 | 1]{ —2 0 2 Oo] -1 0 
62 —-2/ 16 | —2/ 43 | —2/] 34 | —1 4 4 2 4 2 2 
78 1} 18 | —1) 42 | =—2] 37 0} -1| —2 0 12 0 0 
83 2} 20 | —1| 55 2) 42 2| -2 4 4-2] -2 4 
68 —-1/ 15 | —2{ 40 | —3] 38 0 2 3 0 6 0 0 
85 3| 23 0! 50 o0| 40 1 0 0 3 0 0 0 
75 1/ 20 | —-1] 45 | -1] 36 0} -1/{ -1 0 1 0 0 
49 -5| 16 | —-2| 40 | —3| 30 | —2] 10 5 | 10 6 4 6 
64 —2]| 17 | —-2| 48 | —2] 35 | —1 4 4 2 4 2 2 
90 4| 34 4| 55 2] 46 3] 16 s| 12 s| 12 6 
82 2| 22 0| 50 O| 45 3 0 0 6 0 0 0 
55 -3/ 18 | —-1] 48 0} 35 | —1 3 0 3 0 1 0 
87 3| 25 1/ 50 Oo| 42 2 3 0 6 0 2 0 
78 1] 22 0! 50 0| 38 0 0 0 0 0 0 0 
66 —1/ 18 | —-1] 45 | —-1] 35 | -1 1 1 1 1 1 1 
58 —3/ 17 | —-2] 40 | —3| 28 | -—3 6 9 9 6 6 9 
91 4} 32 3/ 59 3| 48 4{/ 12] 12] 16 9/ 12] 12 
94 | 118 | 131 77 | 64\| 82 
V isle lglg? xX lela ls lee” VY itivli wl” 2 lfle fe’ | fz” 
90-94) 3} 4| 12) 48 33-35) 1| 4| 4/16 60-62) 1; 4) 4/16 48-50) 1; 4] 4! 16 
85-89] 5| 3! 15! 45 30-32) 2| 3) 6) 18 57-59/ 1; 3) 5) 9 45-47) 3| 3] 9} 27 
80-84| 4) 2| 8116 27-29' 0| 2} 0} O 54-56) 6| 2| 12) 24 42-44) 6| 2| 12) 24 
75-79| 4| 11 4| 4 24-26 5| 1) 5) 5 51-53) O} 1] ...) .. 30411 3} 1) 3} 3 
70-74| 2| 0 _.. 21-23! 3} O| O} O 48-50/10) OO} ...| .. 36-38) 8| O 
65-69] 4\—-1|\— 4 4 18-20)14/—1|—14| 14 45-47) 4/-1/— 4) 4 33-35) 6|-—1/-— 6] 6 
60-64) 3|—2|— 6| 12. 15-17| 6/—2|/—12) 24 42-44) 5|—2)—10]) 20 30-32) 2;-2/-— 4) 8 
55-50 4)-3)-12 36 ~~ \3il~.|—nil 77 «—- 39-41) _4)/-3)-—12/ 36 27-29) 2|/—-3/— 6) 18 
—4\— 4] 16 ee ee a1| | 191 
S48 i-8- oe — - 31|...1—7 1100 ..... 31|.. 12/102 
pees 31| .. 8/206 
a 
Ce = 341 = .258 Cum 3 = —-226 
Me = 73.79 My = 48.82 
oe = 0/2094, — .258? = 2.564 oy = 9/10% — (—.226)? = 1.86 
Cz = —1%) = —.354 C, = 1% = .387 
Mz = 21.44 M; = 38.66 
os = V7 — (—.354)? = 1.535 os = 9/1034, — .3872 = 1.772 
_ °%1 — (.258)(—.354) 1841 — (.258)(.387) _ 
sient (2.564) (1.535) ~~ to? (2.564) (1.772) - 
| pop wm Ht = (=.354)(.387) _ gy 
' - (1.535) (1.772) ; 
«= (.258)(—.226) _ on TY — (—.354)(—.226) _ 84 
ad (2.564) (1.86) ' ~ (1.535) (1.86) , 
834, — (—.226)(.387) 
fy: = = .83 





(1.86) (1.772) 
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that used for finding sigma in intervals is 


72 
gm ie” _ os 


n 








and that used for finding the coefficient of correlation is 


r2'y’ 





— CLy 
r= 





Oy 


Following is an analysis of the solution: The variables are first 
arranged in parallel columns with space left between them for the later 
introduction of their deviations. 

Below these columns, frequency distributions of the various 
variables are made, using whatever step intervals are found suitable 
for each variable. These step intervals need not be the same for 
different variables. 

Using the ‘“‘guessed mean” method, the deviation of each interval 
is next recorded and the sigmas for each variable computed. Using 
the deviation values thus secured, the proper deviation is next placed 
opposite each score in the columns of variables at the beginning of 
the problem. 

The products of the paired deviations, v’ and z’ are now found and 
placed in the v’z’ column and similarly for the v’y’, v’z’, z’y’, z’z’ and 
y'z’ columns. 

The algebraic sum of these columns having been obtained, all data 
necessary for the solution of the six coefficients of correlation is avail- 
able and no work has been done which would not have been required 
had correlation tables been used, while the labor necessary to make six 
different correlation tables has been avoided. 











A NOMOGRAPH FOR ESTIMATING A RELIABILITY 
COEFFICIENT BY THE SPEARMAN-BROWN 
FORMULA AND FOR COMPUTING ITS 
PROBABLE ERROR 


EDWARD E. CURETON AND JACK W. DUNLAP 
Territorial Normal and Training School, Honolulu, Hawaii 


The Spearman-Brown formula gives the estimate of r:,; when 
T13,, 18 known, as 








T = 271364 

- 1 + T134¢ 
Shen! gives the standard error of this r:; as, 

fu = 2(1 —_ 11) 

VN 
80, 

1.359(1 — rin) 
PE, = : 

Tia 1/N 


711 may be found for any value of 7114, from the double line graph 
(the second scale from the left). 

The nomograph is so constructed that as either N or ri: gets larger 
the accuracy is increased. By interpolation, one can easily enter with 
any desired N or ri, and get PEr;; with an error not greater than .002. 

To use the nomograph find N on the left hand scale, and 7; on the 
right center scale; connect these two points by means of a thread, 
straightedge, or hairline on a celluloid strip; and where the right hand 
scale is cut read PE; |: 


As the scales are logarithmic, they are divided into tenths, fifths, 
and halves depending on the section of the scale. In reading one must 
take into consideration the subdivisions of that particular section. 





-1$hen, E.: Standard Error of Certain Estimated Coefficients of Correlation. 
Journal Educational Psychology, Vol. XV, Oct., 1924, pp. 462-465. 
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RETEST OF THE PERSONALITY TRAITS OF A 
GROUP OF GRADE VI CHILDREN 


WILLIAM D. BUCHANAN 


Principal of the Jackson School, St. Louis, Mo. 


This retest was suggested by the study of the results obtained by 
giving the Downey Group Test to one hundred children in Grade VI 
of the Dozier School, St. Louis. The original test results and a 
description of the whole procedure are recorded in an unpublished 
Masters Thesis at the University of Chicago. During the preparation 
of the thesis several questions arose. When school reopened that fall, 
thirty of the children were still at the Dozier. The thirty children were 
retested with the Downey Group Test and the results are presented in 
this report. 


PROCEDURE 


The training given the teachers before giving the retest and making 
the re-estimates for the thirty children consisted of a rather careful 
study of a group of six Grade VI children not included in the group 
of one hundred children originally tested. 

The estimates were prepared for the group of the six untested 
children by each of the two teachers separately. The average and the 
two estimates used in finding it were discussed in a conference. 

The Downey Group Test was then given to the group of the six 
untested pupils, and the results discussed with the teachers. 

After this study, the two teachers were asked to prepare separately 
estimates for the thirty pupils to be retested. The estimates were 
made and retest given six months after the original test. 

The results were tabulated and the correlations, forty-eight in 
number, were computed. 

Question 1.—Could the correlation between teachers’ estimates of the 
various traits and the scores made upon the test be improved by giving the 
teachers more training in the use and understanding of the test? 

Table I gives the correlations of the ratings with the test scores for 
both the data used in the thesis and the data obtained upon the retest. 

By inspection of the table we find only four correlations that have 
much significance when we take the probable error into consideration. 

Further, we find no appreciable changes in the correlations for 
eight of the Downey traits. 
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Therefore, it would seem that the additional training in estimating 
the Downey traits did not greatly affect the estimates. 


TABLE 1.—CoORRELATIONS BETWEEN TEACHERS’ ESTIMATES OF THE DOWNEY 
TRAITS AND ScorRES MADE UPON THE TEST 











Thesis data Retest data 
Traits 
r PE r PE 
RT SIDS cc wcarvencdeondecevdesds Mot ore . 
MES sci chuueeusueédusecaant — .40 | +.08 18 | +.12 
ee aaa ae aia eee he bien ead cartel .02 | +.10 |} —.07 | +.12 
66 os cthnan cd vachare seats .16 | +.10 | —.08 | +.12 
i sv 65 och 65 bbwe ns bh ee aun iues .09 | +.10 .003) +.12 
ian a5 6:04 6 dé naebaweencsbeeaeeee — .06 | +.10 .02 | +.12 
ns no 6 6 eweeen & ssh 04 eekeees — .59 | +.06 .03 | +.12 
Oe I von ds dine ¥ceusceeas —.17 | +.10 | —.36] +.11 
es ct len vokks cece beatae eobes —.13 | +.10} —.15 | +.12 
a a a —.22; +.10| —.06| +.12 
Coordination of impulses.................... ae eo ee 26} +.11 
Volitional perseveration..................2.6- .08 | +.10 .03 | +.12 

















1 No data. 


Question 2.—Would further training in rating the Downey traits 
indicate that the teachers had not understood them when they prepared the 
ratings used in the thesis? 

In order to obtain some information upon the question, we cor- 
related the ratings made at the time the retest was given with these 
used in the thesis. 

These correlations are given in Table II. 

Inspection of the table shows that the correlations between the 
ratings used in the thesis and those used in the retest are all quite 
significant when we consider the PE except the one for Freedom from 

load. 
The correlations for most of the traits range from .59 to .75 and 
are high enough to indicate substantial agreement when the probable 
errors are taken into consideration. These relatively high correlations 
would seem to indicate that these traits were understood when the 
ratings for the thesis were made and that further training in their 
estimation was not necessary. 

The table shows relatively low correlations for Non-compliance, 
Finality of gudgment, Motor inhibition, and Interest in detail. The 
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probable explanation is that these traits were not correctly understood 
by the teachers when the ratings used in the thesis were made. 


TABLE II.—CoRRELATIONS OF RATINGS OF DowNEY TRAITS FROM THESIS AND 
THE RETEST 














Traits r PE 
ES Ce Tree re .67 + .07 
sd os ce cake eednes cenwaneneen 15 + .12 
Na ais na chee he eed ek ewes .75 + .05 
iii ns kk bah eeieesw he nea’ .72 + .06 
SC eee ee ee .70 + .06 
ie nd ab nk keke ees Cheek a aes .61 + .07 
i city an dbs ep iad ibs es 64 be CON .50 + .09 
ETE EL ee 48 + .09 
EEE ST a PI ee PE 45 + .10 
EES en Ee eae 45 + .10 
Coordination of impulses.................0e000: .62 + .07 
Volitiomal POPSOVOTAEIOR. .. .. 20. c cc cc ccccccccces .59 + .07 





Our answer to Question 2 would be that the teachers did not 
understand all the traits equally well or that the extra training given 
was not sufficient to cause them to change their estimates for some of 
the traits. 

I also believe that Table II shows that the change in ratings of the 
Downey traits was not due to a longer acquaintance (six months) with 
the class. 

If the longer acquaintance with the class had been the determining 
factor the correlations between the thesis ratings and the retest ratings 
should have varied approximately the same amount for each of the 
traits. 

Question 3.—How do the retest scores correlate with the test scores 
used in the thesis? 

In Table III, I shall give some data bearing upon this question. 

A glance at the table shows that the correlations fall into three 
groups. 

1. High correlation between the thesis and the retest Downey 
scores in Flexibility and Interest in detail. 

2. Fair correlation for Freedom from load, Speed of decision, Motor 
inhibition, and Volitional preseveration. 

3. Very low and insignificant correlation for Motor Impulsion, 
Self-confidence, Non-compliance, Finality of judgment, and Coordination 
of impulses. 
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I feel that Table III throws some interesting light upon the test 
and its use with young Grade VI children who have not yet developed 
their writing habits and are still in that stage when they do not ques- 
tion the teacher’s statements. 


TaBLe III.—CorreEuaTIONS OF THE Downey Test SCORES FROM THE THESIS 
AND THE RETEST 

















Traits r PE 

ee Es sce chip eecene uiecab ee eene’ . 

ssa bk eK ds ween hese eee .34 +.11 
66 ik cu Me eed Sees s case ensoedces .78 + .05 
FRE ET COUT TCT E TET ETE Cree .42 + .10 
NG 6s eked thc eeonekeeeeaenen — .02 +.12 
i+. Le ke ah Gages sonneksseheen cai .04 + .12 
6 sear an ee cbh obo 0i 6.50 Ses 08 — .02 + .12 
Ee ee ee — .16 + .12 
IE Tee ee EEE Tee ee .36 + .ii 
i ek id tne wee ee ewe menawe .92 + .12 
Coordination of impulses....................2- 12 + .12 
Volitional perseveration..................ee000- .39 + .10 

1 No data. 
SUMMARY 


1. From a comparison of the correlations of teachers’ ratings of 
the Downey traits and scores made upon the retest it would seem that 
the further training accomplished but little. 

2. From a study of the correlations of the teachers’ ratings of the 
Downey traits, used in the thesis and these used in the retest, it would 
seem that the further training did affect the ratings of some of the 
traits. 

3. Table III shows that several of the traits gave significant correla- 
tions on the retest. Four traits did not. 

4. The data from the retest seems to bear out the opinion expressed 
in the thesis that the Downey group test is not well suited for use with 
young Grade VI children. 
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The Neuroses, by Israel S. Wechsler. Philadelphia: W. B. Saunders 
Co., 1929. Pp. 330. 


The neuroses are, as Dr. Wechsler, in the Preface tells us, ‘‘the 
most ill-defined and least understood clinical entities in the whole 
domain of medicine.’”’ Concerning their dynamics there is probably 
as much disagreement as for any aspect of abnormal psychology. 
To write about it in a manner that will be satisfactory to individuals 
from the many opposing schools is practically impossible. In this 
book the author discusses many points of view but he is in no sense 
an eclectic. He has come to some very definite conclusions and he 
expresses these in no uncertain terms. 

Despite training in scientific medicine where the emphasis is on 
the somatic or organic point of view, the author sees in psycho- 
pathology the best source for the understanding of the neuroses and 
‘‘more or less consistently’ adheres to such an interpretation in his 


practice. His particular brand of psychopathology is clearly revealed 
by the following quotation: 


Numerous workers, more particularly Jung, Abraham, Ferenczi, Jones, Adler, 
Rank, Brill and a few others have contributed to the growth of psychoanalysis 
(some of them adding weeds), but more than ninety per cent of what constitutes 
genuine psychoanalysis was evolved by the originator himself, 


Small wonder that the author’s chapter on the neuroses included 
in his large Textbook of Clinical Neurology which is the core of the 
present volume ‘‘has been honored by personal appreciation of Pro- 
fessor Freud.” 

The Janet school he characterizes as ‘“‘an intellectual approach to 
the neuroses without regard to ‘“‘instincts” or dynamic motivation.” 
And in this school he includes Morton Prince whose investigations he 
describes as representing a ‘“‘more refined stage of what is known as 


purely descriptive psychology.”” Not a word in the book about 
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Prince’s conception of the réle of meaning, setting or context in 
psychoneuroses! 

Hollingworth’s concept of redintegration is mentioned in the 
appendix in a succinct chapter on General Intelligence, Mental Level 
and Psychoneuroses written by the brother of the author, Dr. David 
Wechsler, a psychologist. 

The content of the book can in part be inferred from the foregoing 
discussion. There are seven chapters in all. In Chapter I is a brief 
and interesting description of the history of psychiatry and the 
development of psychopathology. Mental Mechanism; Etiology of 
the Neuroses; Classification of the Neuroses; Clinical Manifestations 
of the Neuroses; the Diagnosis, Course, and Prognosis of the Neuroses; 
The treatment of the Neuroses—these, in order named, are the captions 
of the six ensuing chapters and indicate the organization as well as 
the selection of subject-matter presented. In the appendix are found 
Dr. David Wechsler’s contribution previously mentioned as well as 
his very concise treatment of psychometric tests and also a brief 
description of The History and Examination of Patients. The book 
is adequately indexed and carries a short bibliography of a selected list 
of references. 

“The Neuroses” though written primarily for medical students 
and practitioners is well suited to serve the needs of psychologists 
and others who want as intelligible a description of the Freudian inter- 
pretation of the dynamics of the neuroses as has been the fortune of 
the reviewer to read. H. ME.LTzER. 

Psychiatric Clinic, Saint Louis. 





The Behavior-problem Boy, by Albert A.Owens. Department of Super- 
intendence, Board of Education, Philadelphia, Pa., 1929. Pp. 
188. 


Dr. Owens analyzed the records of 1373 behavior-problem boys 
who did not conform to the demands of regular elementary school 
programs. These boys were segregated in the Daniel Boone School, 
which is the disciplinary center for the entire city of Philadelphia. 
This monograph is based upon the analysis of registration and medical 
records; record of offenses; psychological examinations; personal 
interviews; and Whittier Scales for Grading Homes and Neighborhood 
Conditions. 
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In one group of 365 boys studied, 1000 offenses were charged. 
Truancy led as a cause for transfer from regular schools to the disci- 
plinary center. On the average, three years of misbehavior preceded 
segregation. As in similar studies, Dr. Owens found that he was 
dealing with a group that was subnormal mentally. The median 
IQ of the white boys was 75. It was 80 for the negroes. A foreign 
language was spoken in the homes of 48.4 per cent of the boys. An 
attempt was made to follow up former students. Only a third could 
be located. Evidently the behavior-problem boy tends to come from 
the unstable type of family that is constantly moving. 

Dr. Owens presents a list of twenty-nine characteristics of the 
behavior-problem boy. Some of the most significant seem to be 
that he is anti-social along several lines. Usual classroom activities 
do not appeal to him. There is an even chance that he has a court 
record. The occupational status of his parents is poor and the 
majority of the mothers are gainfully employed. The study ends with 
seven constructive proposals. The need for more careful selection 
of teachers who are to study this type of boy is emphasized. The 
desirability of analyzing the success records of boys who have been 
returned to regular grades is pointed out. 

This monograph contains ten excellent case studies. It has fifty 
tables and eleven illustrations, and is adequately indexed. Altogether, 
it is a contribution of a very high order. RicHarp 8S. UHRBROCK. 

Cornell University. 





The Psychology of Adolescence, by Fowler D. Brooks. Boston: Houghton 
Mifflin and Co., 1929. Pp. XXIII + 652. $3.00. 


This is a new volume in the well-known Riverside Texts in Educa- 
tion series which is edited by Ellwood P. Cubberly. The author is 
a professor at Johns Hopkins University and has given courses in 
adolescence at the summer sessions of the University of Wisconsin. 
The text is somewhat larger than other texts in the series and is done 
in the familiar binding on a quality of paper which seems superior to 
that formerly used. The table of contents lists the principal topics 
discussed in each chapter. There are two separate lists of tables and 
figures which give the titles and page reference of each table or title; 
there is an index; lengthy bibliographies with noteworthy references 
starred are found at the end of each chapter in addition to footnote 
references; ‘‘Problems for Study” are found in each chapter making 
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the book useful for classroom or group discussion uses; one or two 
chapters contain summaries; an excellent glossary of technical terms 
precedes the index. 

In his preface the author says, ‘‘The Psychology of Adolescence has 
the task of describing adolescent nature, growth, and development so as 
to facilitate both reliable prediction and suitable guidance and control 
of behavior during the teens.” A book that would carry out the 
implications of the title which this one bears would be useful to anyone 
dealing with those in the teen-age. The book will be most used by 
those who teach adolescents but it does not ignore the problem of 
the employed adolescent. 

There can be no doubt as to the value of this treatise. It does not 
subscribe to the belief of the birth of a new self as one of the chief 
characteristics of adolescence but amasses a great volume of research 
to sustain the contention ‘that development is a continuous function 
throughout childhood and into adolescence; that, in fact, the roots of 
his [the adolescent’s] present nature lie deeply imbedded in his past.” 
Again (on p. 83), ‘‘We can find little sound evidence, however, 
showing that any of these specific capacities [of mental growth] are 
subject to sudden changes in rate of growth; or, indeed, . . . that the 
development of one of them is generally at the expense of some other; 
1.e., accompanied by a loss or decrease in some other, as the doctrine 
of compensation really implies.’”’ The first six chapters, which include 
the meaning and significance of adolescence, growth in bodily size, 
the development of physical and motor capacities, mental development 
during adolescence, the growth of intelligence, their correlation and 
significance of physical and mental growth during adolescence, are 
very well done and probably constitute the best executed portion of 
the book. 

There is a thorough quantitative inventory of growth in bodily 
size with useful comparisons from birth to maturity in the second 
chapter. The thoroughness with which the growth of intelligence is 
treated is a good index of the scholarship of this book. Chapter VI 
includes a discussion of the coefficients of correlation and alienation 
which should be helpful to many teachers struggling with the problem 
of comprehending educational statistics, because the discussion has 
a very practical application in the chapter. The same may be said 
of the explanation of the regression equation which is given in the chap- 
ter on the prediction of adolescent behavior. Chapter VI gives evi- 
dence to show that scholastic prediction from physical measures 
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is not warranted by present research. The relation of instincts and 
emotions to adolescence, and learning and forgetting are taken up in 
that order. The chapter on adolescent interests gives a good list 
of interest research which has been made; the discussion of the means 
of developing interests should be useful. The treatment of the moral 
and religious life of the adolescent is decidedly temperate and may hold 
surprises for some. The rest of the book, seven chapters, is given over 
principally to the study of personality, its disturbances and guidance. 
Some of this work seems ponderous but if it serves to awaken a 
larger group of schoolmen to the need for more expert help on school 
staffs for the prevention of mental disturbances which are preventable, 
it will have served a worthy purpose. In this respect, one cannot 
help feeling that the book might very profitably be read by elementary 
school administrators who have in their hands the control over much 
that the adolescent becomes. Certainly it would serve to make them 
aware of a larger responsibility than is commonly shouldered. 
Any student of secondary education looking for a research problem 
will find a grand array of them served up here for the first who comes. 
One closes the book with the feeling that it is one which should have 
been printed and which will have a wide usefulness. J. H. CoLEMAN. 
Huntington Public Schools, Huntington, New York. 
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